Journal of Privacy and Confidentiality

Generalized Rainbow Differential Privacy

Mon, 24 Jun 2024 00:00:00 -0700

We study a new framework for designing differentially private (DP) mechanisms via randomized graph colorings, called rainbow differential privacy. In this framework, datasets are nodes in a graph, and two neighboring datasets are connected by an edge. Each dataset in the graph has a preferential ordering for the possible outputs of the mechanism, and these orderings are called rainbows. Different rainbows partition the graph of connected datasets into different regions. We show that if a DP mechanism at the boundary of such regions is fixed and it behaves identically for all same-rainbow boundary datasets, then a unique optimal $(\epsilon,\delta)$-DP mechanism exists (as long as the boundary condition is valid) and can be expressed in closed-form. Our proof technique is based on an interesting relationship between dominance ordering and DP, which applies to any finite number of colors and for $(\epsilon,\delta)$-DP, improving upon previous results that only apply to at most three colors and for $\epsilon$-DP. We justify the homogeneous boundary condition assumption by giving an example with non-homogeneous boundary condition, for which there exists no optimal DP mechanism.

On the Connection Between the ABS Perturbation Methodology and Differential Privacy

Parastoo Sadeghi , Chien-Hung Chien — Tue, 02 Jul 2024 00:00:00 -0700

This paper explores analytical connections between the perturbation methodology of the Australian Bureau of Statistics (ABS) and the differential privacy (DP) framework. We consider a single static counting query function and find the analytical form of the perturbation distribution with symmetric support for the ABS perturbation methodology. We then analytically measure the DP parameters, namely the (ε, δ) pair, for the ABS perturbation methodology under this setting. The results and insights obtained about the behaviour of (ε, δ) with respect to the perturbation support and variance are used to judiciously select the variance of the perturbation distribution give a good δ in the DP framework for a given desired ε and perturbation support. Finally, we propose a simple sampling scheme to implement the perturbation probability matrix in the ABS Cellkey method. The post sampling (ε, δ) pair is numerically analysed as a function of the Cellkey size. It is shown that the best results are obtained for a larger Cellkey size, because the (ε, δ) pair ost-sampling measures remain almost identical when we compare sampling and theoretical results.

Differentially Private Synthetic Control

Saeyoung Rho, Rachel Cummings, Vishal Misra — Mon, 24 Jun 2024 00:00:00 -0700

Synthetic control is a causal inference tool used to estimate the treatment effects of an intervention by creating synthetic counterfactual data. This approach combines measurements from other similar observations (i.e., donor pool) to predict a counterfactual time series of interest (i.e., target unit) by analyzing the relationship between the target and the donor pool before the intervention. As synthetic control tools are increasingly applied to sensitive or proprietary data, formal privacy protections are often required. In this work, we provide the first algorithms for differentially private synthetic control with explicit error bounds. Our approach builds upon tools from non-private synthetic control and differentially private empirical risk minimization. We provide upper and lower bounds on the sensitivity of the synthetic control query and provide explicit error bounds on the accuracy of our private synthetic control algorithms. We show that our algorithms produce accurate predictions for the target unit and that the cost of privacy is small. Finally, we empirically evaluate the performance of our algorithm, and show favorable performance in a variety of parameter regimes, as well as provide guidance to practitioners for hyperparameter tuning.

Differentially Private Fine-tuning of Language Models

Mon, 24 Jun 2024 00:00:00 -0700

We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially private adaptations of these approaches outperform previous private algorithms in three important dimensions: utility, privacy, and the computational and memory cost of private training. On many commonly studied datasets, the utility of private models approaches that of non-private models. For example, on the MNLI dataset we achieve an accuracy of $87.8\%$ using RoBERTa-Large and $83.5\%$ using RoBERTa-Base with a privacy budget of $\epsilon = 6.7$. In comparison, absent privacy constraints, RoBERTa-Large achieves an accuracy of $90.2\%$. Our findings are similar for natural language generation tasks. Privately fine-tuning with DART, GPT-2-Small, GPT-2-Medium, GPT-2-Large, and GPT-2-XL achieve BLEU scores of 38.5, 42.0, 43.1, and 43.8 respectively (privacy budget of $\epsilon = 6.8,\delta=$ 1e-5) whereas the non-private baseline is $48.1$. All our experiments suggest that larger models are better suited for private fine-tuning: while they are well known to achieve superior accuracy non-privately, we find that they also better maintain their accuracy when privacy is introduced.

Private Query Release via the Johnson-Lindenstrauss Transform

Aleksandar Nikolov — Mon, 24 Jun 2024 00:00:00 -0700

We introduce a new method for releasing answers to statistical queries with differential privacy, based on the Johnson-Lindenstrauss lemma. The key idea is to randomly project the query answers to a lower dimensional space so that the distance between any two vectors of feasible query answers is preserved up to an additive error. Then we answer the projected queries using a simple noise-adding mechanism, and lift the answers up to the original dimension. Using this method, we give, for the first time, purely differentially private mechanisms with optimal worst case sample complexity under average error for answering a workload of $k$ queries over a universe of size $N$. As other applications, we give the first purely private efficient mechanisms with optimal sample complexity for computing the covariance of a bounded high-dimensional distribution, and for answering 2-way marginal queries. We also show that, up to the dependence on the error, a variant of our mechanism is nearly optimal for every given query workload.