Estimating Risks of Identification Disclosure in Partially Synthetic Data

Jerome P. Reiter; Robin Mitra

doi:10.29012/jpc.v1i1.567

PDF

Published: Apr 1, 2009

DOI: https://doi.org/10.29012/jpc.v1i1.567

Keywords:

Confidentiality, Public use data, Record linkage, Survey

Jerome P. Reiter

Department of Statistical Science, Duke University, Durham, NC

https://orcid.org/0000-0002-8374-3832

Robin Mitra

University of Southampton, Southampton, UK

https://orcid.org/0000-0001-9584-8044

Abstract

To limit disclosures, statistical agencies and other data disseminators can release partially synthetic, public use microdata sets. These comprise the units originally surveyed; but some collected values, for example, sensitive values at high risk of disclosure or values of key identifiers, are replaced with multiple draws from statistical models. Because the original records are on the file, there remain risks of identifications. In this paper, we describe how to evaluate identification disclosure risks in partially synthetic data, accounting for released information from the multiple datasets, the model used to generate synthetic values, and the approach used to select values to synthesize. We illustrate the computations using the Survey of Youths in Custody.

How to Cite

Reiter, Jerome P., and Robin Mitra. 2009. “Estimating Risks of Identification Disclosure in Partially Synthetic Data”. Journal of Privacy and Confidentiality 1 (1). https://doi.org/10.29012/jpc.v1i1.567.

Issue

Vol. 1 No. 1 (2009): Inaugural Issue

Section

Articles

Copyright is retained by the authors. By submitting to this journal, the author(s) license the article under the Creative Commons License – Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), unless choosing a more lenient license (for instance, public domain). For situations not allowed under CC BY-NC-ND, short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source.

Authors of articles published by the journal grant the journal the right to store the articles in its databases for an unlimited period of time and to distribute and reproduce the articles electronically.

Estimating Risks of Identification Disclosure in Partially Synthetic Data

Abstract

Similar Articles

Most read articles by the same author(s)

Similar Articles

Bayesian Estimation of Disclosure Risks for Multiply Imputed, Synthetic Data

Model Selection when multiple imputation is used to protect confidentiality in public use data

Towards a Systematic Analysis of Privacy Definitions

The Relevance or Irrelevance of Weights for Confidentiality and Statistical Analyses

Differential Privacy for Statistics: What we Know and What we Want to Learn

Toward a Reconceptualization of Confidentiality Protection in the Context of Linkages with Administrative Records

Why It Matters to Distinguish Between Privacy and Confidentiality

Achieving Both Valid and Secure Logistic Regression Analysis on Aggregated Data from Different Private Sources

Partial Information Releases for Confidential Contingency Table Entries: Present and Future Research Efforts

Article Sidebar

Main Article Content

Abstract

Article Details

Similar Articles

Most read articles by the same author(s)