A Privacy Preserving Algorithm to Release Sparse High-dimensional Histograms

Bai Li; Vishesh Karwa; Aleksandra Slavković; Rebecca Carter Steorts

doi:10.29012/jpc.657

PDF

Published: Dec 28, 2018

DOI: https://doi.org/10.29012/jpc.657

Keywords:

differential privacy, high dimensional sparse histograms, stability based algorithm, perturbed Gibbs sampler, Stability Based Hashed Gibbs Sampler

Bai Li

Duke University

Vishesh Karwa

Temple University

Aleksandra Slavković

Pennsylvania State University

Rebecca Carter Steorts

Duke University

https://orcid.org/0000-0003-0114-8181

Abstract

Differential privacy has emerged as a popular model to provably limit privacy risks associated with a given data release. However releasing high dimensional synthetic data under differential privacy remains a challenging problem. In this paper, we study the problem of releasing synthetic data in the form of a high dimensional histogram under the constraint of differential privacy.
We develop an $(\epsilon, \delta)$-differentially private categorical data synthesizer called \emph{Stability Based Hashed Gibbs Sampler} (SBHG). SBHG works by combining a stability based sparse histogram estimation algorithm with Gibbs sampling and feature selection to approximate the empirical joint distribution of a discrete dataset. SBHG offers a competitive alternative to state-of-the art synthetic data generators while preserving the sparsity structure of the original dataset, which leads to improved statistical utility as illustrated on simulated data. Finally, to study the utility of the resulting synthetic data sets generated by SBHG, we also perform logistic regression using the synthetic datasets and compare the classification accuracy with those from using the original dataset.

How to Cite

Li, Bai, Vishesh Karwa, Aleksandra Slavković, and Rebecca Carter Steorts. 2018. “A Privacy Preserving Algorithm to Release Sparse High-Dimensional Histograms”. Journal of Privacy and Confidentiality 8 (1). https://doi.org/10.29012/jpc.657.

Issue

Vol. 8 No. 1 (2018): Commemorating Stephen Fienberg

Section

Articles

Copyright is retained by the authors. By submitting to this journal, the author(s) license the article under the Creative Commons License – Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), unless choosing a more lenient license (for instance, public domain). For situations not allowed under CC BY-NC-ND, short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source.

Authors of articles published by the journal grant the journal the right to store the articles in its databases for an unlimited period of time and to distribute and reproduce the articles electronically.

Author Biographies

Bai Li, Duke University

Bai Li is currently a first year PhD student in the department of Statistical Science at Duke University, where he received his M.S degree under the supervision of Rebecca C. Steorts.

Vishesh Karwa, Temple University

Assistant Professor of Statistics Faculty

My research addresses the challenges in performing statistical inference using complex and/or massive data such as networks, high-dimensional contingency tables, and data that are missing or incomplete. My work is at the intersection of statistics, machine learning, and theoretical computer science and is motivated by many real-world problems with applications to social, political and behavioral sciences. Some of the problems that I currently work on include: (1) Statistical foundations of data privacy and confidentiality, (2) Causal inference under network interference, (3) Finite-sample inference for network models and high-dimensional contingency tables, and (4) Selective inference and adaptive data analyses.

Vishesh Karwa joined the Department of Statistics in 2017. He is also a member of TDAI. Prior to joining Ohio State, Vishesh spent two years at Harvard in the department of statistics and department of Computer science as a Post Doctoral fellow and one year at CMU as a research scientist.

Aleksandra Slavković, Pennsylvania State University

Slavkovic is a professor of statistics who joined Penn State in 2004. She has served in various positions in the statistics department, including associate head for diversity and equity and associate head for graduate studies. Slavkovic has affiliated appointments in the Institute for CyberScience, the Department of Public Health Sciences, and the Penn State College of Medicine, and she serves on Penn State’s Clinical and Translational Sciences Director’s Council. She has also held visiting scholar positions at Cornell University, the University of Minnesota, and Utrecht University.

Slavkovic received master's degrees in human-computer interaction and in statistics and a doctoral degree in statistics from Carnegie Mellon University. Her current research interests include statistical data privacy with applications across different domains, algebraic statistics, causal inference, and more broadly the application of statistics to information sciences and social sciences.

Rebecca Carter Steorts, Duke University

Rebecca C. Steorts received her B.S. in Mathematics in 2005 from Davidson College, her MS in Mathematical Sciences in 2007 from Clemson University, and her PhD in 2012 from the Department of Statistics at the University of Florida under the supervision of Malay Ghosh, where she was a U.S. Census Dissertation Fellow and was a recepient for Honorable Mention (second place) for the 2012 Leonard J. Savage Thesis Award in Applied Methodology. Rebecca was a Visiting Assistant Professor in 2012--2015, where she worked closely with Stephen E. Fienberg.

Rebecca is currently an Assistant Professor in the Department of Statistical Science at Duke University. She is affliated faculty in the Departments of Computer Science and Biostatics and Bioinformatics, the information intiative at Duke (iiD), and the Social Science Research Institute.

Rebecca was named to MIT Technology Review's 35 Innovators Under 35 for 2015 as a humantarian in the field of software. Her work was profiled in the Septmember/October issue of MIT Technology Review and she was recognized with an invited talk at EmTech in November 2015. In addition, Rebecca is a recepient of a NSF CAREER award, a collaborative NSF award, a collaborative grant with the Laboratory of Analaytic (LAS) at NC State University, a Metaknowledge Network Templeton Foundation Grant, the University of Florida (UF) Graduate Alumni Fellowship Award, the U.S. Census Bureau Dissertation Fellowship Award, and the UF Innovation through Institutional Integration Program (I-Cubed) and NSF for development of an introductory Bayesian course for undergraduates. Her research interests are in large scale clustering, record linkage (entity resolution or de-duplication), privacy, network analysis, and machine learning for computational social science applications.

Funding data

National Science Foundation
Grant numbers SES-1534412;CAREER-1652431;SES-1534433

Article Sidebar

Main Article Content