Bridging the Privacy Accounting Gap in DP-SGD

Main Article Content

Lynn Chua
https://orcid.org/0000-0002-5252-5277
Badih Ghazi
https://orcid.org/0009-0004-1555-5321
Charlie Harrison
https://orcid.org/0009-0003-5332-3145
Ethan Leeman
Pritish Kamath
https://orcid.org/0000-0002-4296-2393
Ravi Kumar
https://orcid.org/0000-0002-2203-2586
Pasin Manurangsi
https://orcid.org/0000-0002-1052-2801
Amer Sinha
https://orcid.org/0009-0001-9504-6970
Chiyuan Zhang

Abstract

Differentially Private Stochastic Gradient Descent (DP-SGD) is one of the most widely used algorithms for private machine learning. Due to its efficiency, most practical implementations of DP-SGD shuffle the training examples and divide them into fixed-size mini-batches during training. However, the privacy accounting typically assumes that Poisson subsampling was used, wherein each example is included in each mini-batch independently with some probability. Our first contribution is to show that there can be a substantial gap between these two versions of DP-SGD; specifically, the privacy accounting implies much stronger privacy guarantees than the implementations actually provide. As our second contribution, we propose two approaches to address this gap: (i) an implementation of Poisson subsampling using the Map-Reduce framework that can scale to large datasets that do not fit in memory and (ii) a novel Balls-and-Bins sampling that achieves the “best of both” sampling approaches. Namely, its implementation is similar to shuffling, and it leads to similar utility for DP-SGD training with similar-or-better privacy compared to Poisson subsampling in practical regimes of parameters.

Article Details

How to Cite
Chua, Lynn, Badih Ghazi, Charlie Harrison, Ethan Leeman, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, and Chiyuan Zhang. 2025. “Bridging the Privacy Accounting Gap in DP-SGD”. Journal of Privacy and Confidentiality 15 (3). https://doi.org/10.29012/jpc.998.
Section
TPDP 2024