Bridging the Privacy Accounting Gap in DP-SGD

Lynn Chua; Badih Ghazi; Charlie Harrison; Ethan Leeman; Pritish Kamath; Ravi Kumar; Pasin Manurangsi; Amer Sinha; Chiyuan Zhang

doi:10.29012/jpc.998

PDF Github

Published: Dec 31, 2025

DOI: https://doi.org/10.29012/jpc.998

Keywords:

Differential Privacy, DP-SGD, Shuffling, Poisson Subsampling, Balls-and-Bins sampling

Lynn Chua

Google Research

https://orcid.org/0000-0002-5252-5277

Badih Ghazi

Google Research

https://orcid.org/0009-0004-1555-5321

Charlie Harrison

Google

https://orcid.org/0009-0003-5332-3145

Ethan Leeman

Google

Pritish Kamath

Google Research

https://orcid.org/0000-0002-4296-2393

Ravi Kumar

Google Research

https://orcid.org/0000-0002-2203-2586

Pasin Manurangsi

Google Research

https://orcid.org/0000-0002-1052-2801

Amer Sinha

Google Research

https://orcid.org/0009-0001-9504-6970

Chiyuan Zhang

Google Research

Abstract

Differentially Private Stochastic Gradient Descent (DP-SGD) is one of the most widely used algorithms for private machine learning. Due to its efficiency, most practical implementations of DP-SGD shuffle the training examples and divide them into fixed-size mini-batches during training. However, the privacy accounting typically assumes that Poisson subsampling was used, wherein each example is included in each mini-batch independently with some probability. Our first contribution is to show that there can be a substantial gap between these two versions of DP-SGD; specifically, the privacy accounting implies much stronger privacy guarantees than the implementations actually provide. As our second contribution, we propose two approaches to address this gap: (i) an implementation of Poisson subsampling using the Map-Reduce framework that can scale to large datasets that do not fit in memory and (ii) a novel Balls-and-Bins sampling that achieves the “best of both” sampling approaches. Namely, its implementation is similar to shuffling, and it leads to similar utility for DP-SGD training with similar-or-better privacy compared to Poisson subsampling in practical regimes of parameters.

How to Cite

Chua, Lynn, Badih Ghazi, Charlie Harrison, Ethan Leeman, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, and Chiyuan Zhang. 2025. “Bridging the Privacy Accounting Gap in DP-SGD”. Journal of Privacy and Confidentiality 15 (3). https://doi.org/10.29012/jpc.998.

Issue

Vol. 15 No. 3 (2025): Regular issue, including articles based on presentations at TPDP 2024

Section

TPDP 2024

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Copyright is retained by the authors. By submitting to this journal, the author(s) license the article under the Creative Commons License – Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), unless choosing a more lenient license (for instance, public domain). For situations not allowed under CC BY-NC-ND, short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source.

Authors of articles published by the journal grant the journal the right to store the articles in its databases for an unlimited period of time and to distribute and reproduce the articles electronically.

Article Sidebar

Main Article Content

Abstract

Article Details