Deep Learning With Differential Privacy and the Saddle Point Accountant

Juan Gomez, Harvard University

Photo of Juan Gomez

Machine learning (ML) techniques based on neural networks achieve remarkable accuracy in a wide variety of domains. In applications which utilize human-generated data, the datasets used to train these models often contain private/sensitive information. The careful design of ML models that do not expose private information is of vital importance in these domains. In this work, privacy is defined as the capacity of an adversary to seclude information about a specific person or group using the outputs of a model. Differential privacy is a mathematical framework that uses this definition, along with information-theoretic techniques, to quantify the greatest possible information gained by an adversary. It further provides strict bounds under which individual privacy is provably guaranteed. Guaranteeing privacy when training ML models presents a unique challenge. Machine-learning algorithms often query the training dataset thousands of times, and each successive query necessarily leads to privacy loss. Algorithms which keep track of the privacy loss are known as "accounting" algorithms. How to optimally account for privacy in the limit of large number of queries is an open problem in differential privacy. In this work, we propose the Saddle Point Accountant (SPA). We provide rigorous performance guarantees by deriving upper and lower bounds for the privacy-accounting approximation error offered by the SPA. Numerical experiments applying the SPA to a differentially-private stochastic gradient descent algorithm demonstrate that SPA achieves comparable accuracy to state-of-the-art accounting methods (which scale like sqrt(k) for k queries) while scaling independent of the number of queries.

Abstract Author(s): Juan Felipe Gomez, Wael Alghamdi, Flavio Calmon