Journal of Privacy and Confidentiality <p>The <em>Journal of Privacy and Confidentiality</em>&nbsp;is an open-access multi-disciplinary journal whose purpose is to facilitate the coalescence of research methodologies and activities in the areas of privacy, confidentiality, and disclosure limitation. The JPC seeks to publish a wide range of research and review papers, not only from academia, but also from government (especially official statistical agencies) and industry, and to serve as a forum for exchange of views, discussion, and news.</p> en-US <p>Copyright is retained by the authors. By submitting to this journal, the author(s) license the article under the <a href="">Creative Commons License – Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)</a>, unless choosing a more lenient license (for instance, public domain). For situations not allowed under CC BY-NC-ND, short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source.</p> <p>Authors of articles published by the journal grant the journal the right to store the articles in its databases for an unlimited period of time and to distribute and reproduce the articles electronically.</p> (Lars Vilhuber and/or Rachel Cummings) (Lars Vilhuber (temporarily)) Mon, 24 Jun 2024 06:03:51 -0700 OJS 60 Private Query Release via the Johnson-Lindenstrauss Transform <p>We introduce a new method for releasing answers to statistical queries with differential privacy, based on the Johnson-Lindenstrauss lemma. The key idea is to randomly project the query answers to a lower dimensional space so that the distance between any two vectors of feasible query answers is preserved up to an additive error. Then we answer the projected queries using a simple noise-adding mechanism, and lift the answers up to the original dimension. Using this method, we give, for the first time, purely differentially private mechanisms with optimal worst case sample complexity under average error for answering a workload of $k$ queries over a universe of size $N$. As other applications, we give the first purely private efficient mechanisms with optimal sample complexity for computing the covariance of a bounded high-dimensional distribution, and for answering 2-way marginal queries. We also show that, up to the dependence on the error, a variant of our mechanism is nearly optimal for every given query workload.</p> Aleksandar Nikolov Copyright (c) 2024 Aleksandar Nikolov Mon, 24 Jun 2024 00:00:00 -0700 Differentially Private Synthetic Control <p>Synthetic control is a causal inference tool used to estimate the treatment effects of an intervention by creating synthetic counterfactual data. This approach combines measurements from other similar observations (i.e., donor pool) to predict a counterfactual time series of interest (i.e., target unit) by analyzing the relationship between the target and the donor pool before the intervention. As synthetic control tools are increasingly applied to sensitive or proprietary data, formal privacy protections are often required. In this work, we provide the first algorithms for differentially private synthetic control with explicit error bounds. Our approach builds upon tools from non-private synthetic control and differentially private empirical risk minimization. We provide upper and lower bounds on the sensitivity of the synthetic control query and provide explicit error bounds on the accuracy of our private synthetic control algorithms. We show that our algorithms produce accurate predictions for the target unit and that the cost of privacy is small. Finally, we empirically evaluate the performance of our algorithm, and show favorable performance in a variety of parameter regimes, as well as provide guidance to practitioners for hyperparameter tuning.</p> Saeyoung Rho, Rachel Cummings, Vishal Misra Copyright (c) 2024 Saeyoung Rho, Rachel Cummings, Vishal Misra Mon, 24 Jun 2024 00:00:00 -0700 Differentially Private Fine-tuning of Language Models <p>We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially private adaptations of these approaches outperform previous private algorithms in three important dimensions: utility, privacy, and the computational and memory cost of private training. On many commonly studied datasets, the utility of private models approaches that of non-private models. For example, on the MNLI dataset we achieve an accuracy of $87.8\%$ using RoBERTa-Large and $83.5\%$ using RoBERTa-Base with a privacy budget of $\epsilon = 6.7$. In comparison, absent privacy constraints, RoBERTa-Large achieves an accuracy of $90.2\%$. Our findings are similar for natural language generation tasks. Privately fine-tuning with DART, GPT-2-Small, GPT-2-Medium, GPT-2-Large, and GPT-2-XL achieve BLEU scores of 38.5, 42.0, 43.1, and 43.8 respectively (privacy budget of $\epsilon = 6.8,\delta=$ 1e-5) whereas the non-private baseline is $48.1$. All our experiments suggest that larger models are better suited for private fine-tuning: while they are well known to achieve superior accuracy non-privately, we find that they also better maintain their accuracy when privacy is introduced.</p> Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A. Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, Sergey Yekhanin, Huishuai Zhang Copyright (c) 2024 Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A. Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, Sergey Yekhanin, Huishuai Zhang Mon, 24 Jun 2024 00:00:00 -0700 Generalized Rainbow Differential Privacy <p>We study a new framework for designing differentially private (DP) mechanisms via randomized graph colorings, called rainbow differential privacy. In this framework, datasets are nodes in a graph, and two neighboring datasets are connected by an edge. Each dataset in the graph has a preferential ordering for the possible outputs of the mechanism, and these orderings are called rainbows. Different rainbows partition the graph of connected datasets into different regions. We show that if a DP mechanism at the boundary of such regions is fixed and it behaves identically for all same-rainbow boundary datasets, then a unique optimal $(\epsilon,\delta)$-DP mechanism exists (as long as the boundary condition is valid) and can be expressed in closed-form. Our proof technique is based on an interesting relationship between dominance ordering and DP, which applies to any finite number of colors and for $(\epsilon,\delta)$-DP, improving upon previous results that only apply to at most three colors and for $\epsilon$-DP. We justify the homogeneous boundary condition assumption by giving an example with non-homogeneous boundary condition, for which there exists no optimal DP mechanism.</p> Yuzhou Gu, Ziqi Zhou, Onur Günlü, Rafael G. L. D'Oliveira, Parastoo Sadeghi, Muriel Médard, Rafael F. Schaefer Copyright (c) 2024 Yuzhou Gu, Ziqi Zhou, Onur Günlü, Rafael G. L. D'Oliveira, Parastoo Sadeghi, Muriel Médard, Rafael F. Schaefer Mon, 24 Jun 2024 00:00:00 -0700 On the Connection Between the ABS Perturbation Methodology and Differential Privacy <p>This paper explores analytical connections between the perturbation methodology of the Australian Bureau of Statistics (ABS) and the differential privacy (DP) framework. We consider a single static counting query function and find the analytical form of the perturbation distribution with symmetric support for the ABS perturbation methodology. We then analytically measure the DP parameters, namely the (ε, δ) pair, for the ABS perturbation methodology under this setting. The results and insights obtained about the behaviour of (ε, δ) with respect to the perturbation support and variance are used to judiciously select the variance of the perturbation distribution give a good δ in the DP framework for a given desired ε and perturbation support. Finally, we propose a simple sampling scheme to implement the perturbation probability matrix in the ABS Cellkey method. The post sampling (ε, δ) pair is numerically analysed as a function of the Cellkey size. It is shown that the best results are obtained for a larger Cellkey size, because the (ε, δ) pair ost-sampling measures remain almost identical when we compare sampling and theoretical results.</p> Parastoo Sadeghi , Chien-Hung Chien Copyright (c) 2024 Chien-Hung Chien, Parastoo Sadeghi Tue, 02 Jul 2024 00:00:00 -0700