Bridging the Privacy Accounting Gap in DP-SGD
Main Article Content
Abstract
Differentially Private Stochastic Gradient Descent (DP-SGD) is one of the most widely used algorithms for private machine learning. Due to its efficiency, most practical implementations of DP-SGD shuffle the training examples and divide them into fixed-size mini-batches during training. However, the privacy accounting typically assumes that Poisson subsampling was used, wherein each example is included in each mini-batch independently with some probability. Our first contribution is to show that there can be a substantial gap between these two versions of DP-SGD; specifically, the privacy accounting implies much stronger privacy guarantees than the implementations actually provide. As our second contribution, we propose two approaches to address this gap: (i) an implementation of Poisson subsampling using the Map-Reduce framework that can scale to large datasets that do not fit in memory and (ii) a novel Balls-and-Bins sampling that achieves the “best of both” sampling approaches. Namely, its implementation is similar to shuffling, and it leads to similar utility for DP-SGD training with similar-or-better privacy compared to Poisson subsampling in practical regimes of parameters.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Copyright is retained by the authors. By submitting to this journal, the author(s) license the article under the Creative Commons License – Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), unless choosing a more lenient license (for instance, public domain). For situations not allowed under CC BY-NC-ND, short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source.
Authors of articles published by the journal grant the journal the right to store the articles in its databases for an unlimited period of time and to distribute and reproduce the articles electronically.