Journal of Privacy and Confidentiality 2019-11-16T22:09:51-08:00 Lars Vilhuber Open Journal Systems <p>The <em>Journal of Privacy and Confidentiality</em>&nbsp;is an open-access multi-disciplinary journal whose purpose is to facilitate the coalescence of research methodologies and activities in the areas of privacy, confidentiality, and disclosure limitation. The JPC seeks to publish a wide range of research and review papers, not only from academia, but also from government (especially official statistical agencies) and industry, and to serve as a forum for exchange of views, discussion, and news.</p> Editorial for Volume 9 Issue 2 2019-11-16T22:09:24-08:00 Jonathan Ullman Lars Vilhuber <p>Differential privacy is a promising approach to privacy-preserving data analysis that provides strong worst-case guarantees about the harm that a user could suffer from contributing their data, but is also flexible enough to allow for a wide variety of data analyses to be performed with a high degree of utility. Researchers in differential privacy span many distinct research communities, including algorithms, computer security, cryptography, databases, data mining, machine learning, statistics, programming languages, social sciences, and law.</p> <p>Two articles in this issue describe applications of differentially private, or nearly differentially private, algorithms to data from the U.S. Census Bureau. A&nbsp; third article highlights a thorny issue that applies to all implementations of differential privacy: how to choose the key privacy parameter <strong><span title="Greek language text" lang="el">ε</span></strong>,</p> <p>This special issue also includes selected contributions from the 3rd Workshop on Theory and Practice of Differential Privacy, which was held in Dallas, TX on October 30, 2017 as part of the ACM Conference on Computer Security (CCS).</p> 2019-10-17T11:55:44-07:00 ##submission.copyrightStatement## Releasing Earnings Distributions using Differential Privacy 2019-11-16T22:09:23-08:00 Andrew David Foote Ashwin Machanavajjhala Kevin McKinney <p>The U.S. Census Bureau recently released data on earnings percentiles of graduates from post-secondary institutions. This paper describes and evaluates the disclosure avoidance system developed for these statistics. We propose a differentially private algorithm for releasing these data based on standard differentially private building blocks, by constructing a histogram of earnings and the application of the Laplace mechanism to recover a differentially-private CDF of earnings. We demonstrate that our algorithm can release earnings distributions with low error, and our algorithm out-performs prior work based on the concept of smooth sensitivity from Nissim et al. (2007).</p> 2019-10-18T13:37:54-07:00 ##submission.copyrightStatement## Differential Privacy in Practice: Expose your Epsilons! 2019-11-16T22:09:23-08:00 Cynthia Dwork Nitin Kohli Deirdre Mulligan <p>Differential privacy is at a turning point. Implementations have been successfully leveraged in private industry, the public sector, and academia in a wide variety of applications, allowing scientists, engineers, and researchers the ability to learn about populations of interest without specifically learning about these individuals. Because differential privacy allows us to quantify cumulative privacy loss, these differentially private systems will, for the first time, allow us to measure and compare the total privacy loss due to these personal data-intensive activities. Appropriately leveraged, this could be a watershed moment for privacy.</p> <p>Like other technologies and techniques that allow for a range of instantiations, implementation details matter. When meaningfully implemented, differential privacy supports deep data-driven insights with minimal worst-case privacy loss. When not meaningfully implemented, differential privacy delivers privacy mostly in name. Using differential privacy to maximize learning while providing a meaningful degree of privacy requires judicious choices with respect to the privacy parameter epsilon, among other factors. However, there is little understanding of what is the optimal value of epsilon for a given system or classes of systems/purposes/data etc. or how to go about figuring it out.</p> <p>To understand current differential privacy implementations and how organizations make these key choices in practice, we conducted interviews with practitioners to learn from their experiences of implementing differential privacy. We found no clear consensus on how to choose epsilon, nor is there agreement on how to approach this and other key implementation decisions. Given the importance of these implementation details there is a need for shared learning amongst the differential privacy community. To serve these purposes, we propose the creation of the Epsilon Registry—a publicly available communal body of knowledge about differential privacy implementations that can be used by various stakeholders to drive the identification and adoption of judicious differentially private implementations.</p> 2019-10-20T17:33:49-07:00 ##submission.copyrightStatement## A Practical Method to Reduce Privacy Loss When Disclosing Statistics Based on Small Samples 2019-11-16T22:09:21-08:00 Raj Chetty John N Friedman <p>We develop a simple method to reduce privacy loss when disclosing statistics such as OLS regression estimates based on samples with small numbers of observations. We focus on the case where the dataset can be broken into many groups (“cells”) and one is interested in releasing statistics for one or more of these cells. Building on ideas from the differential privacy literature, we add noise to the statistic of interest in proportion to the statistic's maximum observed sensitivity, defined as the maximum change in the statistic from adding or removing a single observation across all the cells in the data. Intuitively, our approach permits the release of statistics in arbitrarily small samples by adding sufficient noise to the estimates to protect privacy. Although our method does not offer a formal privacy guarantee, it generally outperforms widely used methods of disclosure limitation such as count-based cell suppression both in terms of privacy loss and statistical bias. We illustrate how the method can be implemented by discussing how it was used to release estimates of social mobility by Census tract in the Opportunity Atlas. We also provide a step-by-step guide and illustrative Stata code to implement our approach.</p> 2019-10-22T14:25:36-07:00 ##submission.copyrightStatement## Differential Privacy on Finite Computers 2019-11-16T22:09:27-08:00 Victor Balcer Salil Vadhan <p style="-qt-block-indent: 0; text-indent: 0px; margin: 0px;">We consider the problem of designing and analyzing differentially private algorithms that can be implemented on <em>discrete</em> models of computation in <em>strict</em> polynomial time, motivated by known attacks on floating point implementations of real-arithmetic differentially private algorithms (Mironov, CCS 2012) and the potential for timing attacks on expected polynomial-time algorithms.</p> <p style="-qt-block-indent: 0; text-indent: 0px; margin: 0px;">As a case study, we examine the basic problem of approximating the histogram of a categorical dataset over a possibly large data universe X.</p> <p style="-qt-block-indent: 0; text-indent: 0px; margin: 0px;">The classic Laplace Mechanism (Dwork, McSherry, Nissim, Smith, TCC 2006 and J. Privacy \&amp; Confidentiality 2017) does not satisfy our requirements, as it is based on real arithmetic, and natural discrete analogues, such as the Geometric Mechanism (Ghosh, Roughgarden, Sundarajan, STOC 2009 and SICOMP 2012), take time at least linear in |X|, which can be exponential in the bit length of the input.</p> <p style="-qt-paragraph-type: empty; -qt-block-indent: 0; text-indent: 0px; margin: 0px;">&nbsp;</p> <p style="-qt-block-indent: 0; text-indent: 0px; margin: 0px;">In this paper, we provide strict polynomial-time discrete algorithms for approximate histograms whose simultaneous accuracy (the maximum error over all bins) matches that of the Laplace Mechanism up to constant factors, while retaining the same (pure) differential privacy guarantee.</p> <p style="-qt-block-indent: 0; text-indent: 0px; margin: 0px;">One of our algorithms produces a sparse histogram as output.</p> <p style="-qt-block-indent: 0; text-indent: 0px; margin: 0px;">Its ``"per-bin accuracy" (the error on individual bins) is worse than that of the Laplace Mechanism by a factor of log|X|, but we prove a lower bound showing that this is necessary for any algorithm that produces a sparse histogram.</p> <p style="-qt-block-indent: 0; text-indent: 0px; margin: 0px;">A second algorithm avoids this lower bound, and matches the per-bin accuracy of the Laplace Mechanism, by producing a compact and efficiently computable representation of a dense histogram; it is based on an (n+1)-wise independent implementation of an appropriately clamped version of the Discrete Geometric Mechanism.</p> 2019-09-25T18:13:34-07:00 ##submission.copyrightStatement## BLENDER: Enabling Local Search with a Hybrid Differential Privacy Model 2019-11-16T22:09:26-08:00 Brendan Avent Aleksandra Korolova David Zeber Torgeir Hovden Benjamin Livshits <pre style="-qt-block-indent: 0; text-indent: 0px; margin: 0px;"><span style="color: #000000;">We propose a <em>hybrid</em></span><span style="color: #000000;"> model of differential privacy that considers a combination of regular and opt-in users who desire the differential privacy guarantees of the local privacy model and the trusted curator model, respectively. We demonstrate that within this model, it is possible to design a new type of <em>blended </em></span><span style="color: #000000;">algorithm that improves the utility of obtained data, while providing users with their desired privacy guarantees.</span></pre> <pre style="-qt-block-indent: 0; text-indent: 0px; margin: 0px;"><span style="color: #000000;">We apply this algorithm to the task of privately computing the head of the search log and show that the blended approach provides significant improvements in the utility of the data compared to related work.</span></pre> <pre style="-qt-block-indent: 0; text-indent: 0px; margin: 0px;"><span style="color: #000000;">Specifically, on two large search click data sets, comprising </span><span style="color: #000000;">1.75 and </span><span style="color: #000000;">16 GB, respectively, our approach attains NDCG </span><span style="color: #000000;">values exceeding </span><span style="color: #000000;">95</span><span style="color: #000000;">% across a range of privacy budget values.</span></pre> 2019-09-27T11:31:05-07:00 ##submission.copyrightStatement## Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained ERM 2019-11-16T22:09:25-08:00 Steven Wu Aaron Roth Katrina Ligett Bo Waggoner Seth Neel <p>Traditional approaches to differential privacy assume a fixed privacy requirement ε for a computation, and attempt to maximize the accuracy of the computation subject to the privacy constraint. As differential privacy is increasingly deployed in practical settings, it may often be that there is instead a fixed accuracy requirement for a given computation and the data analyst would like to maximize the privacy of the computation subject to the accuracy constraint. This raises the question of how to find and run a maximally private empirical risk minimizer subject to a given accuracy requirement. We propose a general “noise reduction” framework that can apply to a variety of private empirical risk minimization (ERM) algorithms, using them to “search” the space of privacy levels to find the empirically strongest one that meets the accuracy constraint, and incurring only logarithmic overhead in the number of privacy levels searched. The privacy analysis of our algorithm leads naturally to a version of differential privacy where the privacy parameters are dependent on the data, which we term ex-post privacy, and which is related to the recently introduced notion of privacy odometers. We also give an ex-post privacy analysis of the classical AboveThreshold privacy tool, modifying it to allow for queries chosen depending on the database. Finally, we apply our approach to two common objective functions, regularized linear and logistic regression, and empirically compare our noise reduction methods to (i) inverting the theoretical utility guarantees of standard private ERM algorithms and (ii) a stronger, empirical baseline based on binary search.</p> 2019-09-27T11:35:26-07:00 ##submission.copyrightStatement## Program for TPDP 2017 2019-11-16T22:09:51-08:00 Jonathan Ullman Lars Vilhuber <p>The Theory and Practice of Differential Privacy workshop (TPDP 2017) was held in Dallas, TX, USA on October 30, 2017 as part of CCS 2017. This is the final program.</p> 2018-11-21T11:56:51-08:00 ##submission.copyrightStatement##