https://journalprivacyconfidentiality.org/index.php/jpc/issue/feedJournal of Privacy and Confidentiality2024-06-24T06:03:51-07:00Lars Vilhuber and/or Rachel Cummingsmanaging-editor@journalprivacyconfidentiality.orgOpen Journal Systems<p>The <em>Journal of Privacy and Confidentiality</em> is an open-access multi-disciplinary journal whose purpose is to facilitate the coalescence of research methodologies and activities in the areas of privacy, confidentiality, and disclosure limitation. The JPC seeks to publish a wide range of research and review papers, not only from academia, but also from government (especially official statistical agencies) and industry, and to serve as a forum for exchange of views, discussion, and news.</p>https://journalprivacyconfidentiality.org/index.php/jpc/article/view/880Differentially Private Fine-tuning of Language Models2023-11-27T09:22:41-08:00Da Yuyuda3@mail2.sysu.edu.cnSaurabh Naiksnaik@microsoft.comArturs Backursarturs.backurs@microsoft.comSivakanth Gopisigopi@microsoft.comHuseyin A. Inanhuseyin.inan@microsoft.comGautam Kamathg@csail.mit.eduJanardhan Kulkarnijakul@microsoft.comYin Tat Leeyintatlee@microsoft.comAndre Manoelamonteiroman@microsoft.comLukas Wutschitzlukas.wutschitz@microsoft.comSergey Yekhaninyekhanin@microsoft.comHuishuai Zhanghuzhang@microsoft.com<p>We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially private adaptations of these approaches outperform previous private algorithms in three important dimensions: utility, privacy, and the computational and memory cost of private training. On many commonly studied datasets, the utility of private models approaches that of non-private models. For example, on the MNLI dataset we achieve an accuracy of $87.8\%$ using RoBERTa-Large and $83.5\%$ using RoBERTa-Base with a privacy budget of $\epsilon = 6.7$. In comparison, absent privacy constraints, RoBERTa-Large achieves an accuracy of $90.2\%$. Our findings are similar for natural language generation tasks. Privately fine-tuning with DART, GPT-2-Small, GPT-2-Medium, GPT-2-Large, and GPT-2-XL achieve BLEU scores of 38.5, 42.0, 43.1, and 43.8 respectively (privacy budget of $\epsilon = 6.8,\delta=$ 1e-5) whereas the non-private baseline is $48.1$. All our experiments suggest that larger models are better suited for private fine-tuning: while they are well known to achieve superior accuracy non-privately, we find that they also better maintain their accuracy when privacy is introduced.</p>2024-06-24T00:00:00-07:00Copyright (c) 2024 Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A. Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, Sergey Yekhanin, Huishuai Zhanghttps://journalprivacyconfidentiality.org/index.php/jpc/article/view/873Private Query Release via the Johnson-Lindenstrauss Transform2024-03-20T08:38:18-07:00Aleksandar Nikolovsasho.nikolov@utoronto.ca<p>We introduce a new method for releasing answers to statistical queries with differential privacy, based on the Johnson-Lindenstrauss lemma. The key idea is to randomly project the query answers to a lower dimensional space so that the distance between any two vectors of feasible query answers is preserved up to an additive error. Then we answer the projected queries using a simple noise-adding mechanism, and lift the answers up to the original dimension. Using this method, we give, for the first time, purely differentially private mechanisms with optimal worst case sample complexity under average error for answering a workload of $k$ queries over a universe of size $N$. As other applications, we give the first purely private efficient mechanisms with optimal sample complexity for computing the covariance of a bounded high-dimensional distribution, and for answering 2-way marginal queries. We also show that, up to the dependence on the error, a variant of our mechanism is nearly optimal for every given query workload.</p>2024-06-24T00:00:00-07:00Copyright (c) 2024 Aleksandar Nikolovhttps://journalprivacyconfidentiality.org/index.php/jpc/article/view/879Differentially Private Synthetic Control2024-03-06T13:36:30-08:00Saeyoung Rhos.rho@columbia.eduRachel Cummingsrac2239@columbia.eduVishal Misravishal.misra@columbia.edu<p>Synthetic control is a causal inference tool used to estimate the treatment effects of an intervention by creating synthetic counterfactual data. This approach combines measurements from other similar observations (i.e., donor pool) to predict a counterfactual time series of interest (i.e., target unit) by analyzing the relationship between the target and the donor pool before the intervention. As synthetic control tools are increasingly applied to sensitive or proprietary data, formal privacy protections are often required. In this work, we provide the first algorithms for differentially private synthetic control with explicit error bounds. Our approach builds upon tools from non-private synthetic control and differentially private empirical risk minimization. We provide upper and lower bounds on the sensitivity of the synthetic control query and provide explicit error bounds on the accuracy of our private synthetic control algorithms. We show that our algorithms produce accurate predictions for the target unit and that the cost of privacy is small. Finally, we empirically evaluate the performance of our algorithm, and show favorable performance in a variety of parameter regimes, as well as provide guidance to practitioners for hyperparameter tuning.</p>2024-06-24T00:00:00-07:00Copyright (c) 2024 Saeyoung Rho, Rachel Cummings, Vishal Misrahttps://journalprivacyconfidentiality.org/index.php/jpc/article/view/896Generalized Rainbow Differential Privacy2023-12-27T14:08:33-08:00Yuzhou Gusevenkplus.g@gmail.comZiqi Zhouzhou@ccs-labs.orgOnur Günlüonur.gunlu@liu.seRafael G. L. D'Oliveirardolive@clemson.eduParastoo Sadeghip.sadeghi@unsw.edu.auMuriel Médardmedard@mit.eduRafael F. Schaeferrafael.schaefer@tu-dresden.de<p>We study a new framework for designing differentially private (DP) mechanisms via randomized graph colorings, called rainbow differential privacy. In this framework, datasets are nodes in a graph, and two neighboring datasets are connected by an edge. Each dataset in the graph has a preferential ordering for the possible outputs of the mechanism, and these orderings are called rainbows. Different rainbows partition the graph of connected datasets into different regions. We show that if a DP mechanism at the boundary of such regions is fixed and it behaves identically for all same-rainbow boundary datasets, then a unique optimal $(\epsilon,\delta)$-DP mechanism exists (as long as the boundary condition is valid) and can be expressed in closed-form. Our proof technique is based on an interesting relationship between dominance ordering and DP, which applies to any finite number of colors and for $(\epsilon,\delta)$-DP, improving upon previous results that only apply to at most three colors and for $\epsilon$-DP. We justify the homogeneous boundary condition assumption by giving an example with non-homogeneous boundary condition, for which there exists no optimal DP mechanism.</p>2024-06-24T00:00:00-07:00Copyright (c) 2024 Yuzhou Gu, Ziqi Zhou, Onur Günlü, Rafael G. L. D'Oliveira, Parastoo Sadeghi, Muriel Médard, Rafael F. Schaeferhttps://journalprivacyconfidentiality.org/index.php/jpc/article/view/859On the Connection Between the ABS Perturbation Methodology and Differential Privacy2023-03-07T06:53:42-08:00Parastoo Sadeghi p.sadeghi@unsw.edu.auChien-Hung Chienjoseph.chien@live.com<p>This paper explores analytical connections between the perturbation methodology of the Australian Bureau of Statistics (ABS) and the differential privacy (DP) framework. We consider a single static counting query function and find the analytical form of the perturbation distribution with symmetric support for the ABS perturbation methodology. We then analytically measure the DP parameters, namely the (ε, δ) pair, for the ABS perturbation methodology under this setting. The results and insights obtained about the behaviour of (ε, δ) with respect to the perturbation support and variance are used to judiciously select the variance of the perturbation distribution give a good δ in the DP framework for a given desired ε and perturbation support. Finally, we propose a simple sampling scheme to implement the perturbation probability matrix in the ABS Cellkey method. The post sampling (ε, δ) pair is numerically analysed as a function of the Cellkey size. It is shown that the best results are obtained for a larger Cellkey size, because the (ε, δ) pair ost-sampling measures remain almost identical when we compare sampling and theoretical results.</p>2024-07-02T00:00:00-07:00Copyright (c) 2024 Chien-Hung Chien, Parastoo Sadeghi