Estimating Risks of Identification Disclosure in Partially Synthetic Data

Jerome P. Reiter; Robin Mitra

doi:10.29012/jpc.v1i1.567

PDF

Published: Apr 1, 2009

DOI: https://doi.org/10.29012/jpc.v1i1.567

Keywords:

Confidentiality, Public use data, Record linkage, Survey

Jerome P. Reiter

Department of Statistical Science, Duke University, Durham, NC

https://orcid.org/0000-0002-8374-3832

Robin Mitra

University of Southampton, Southampton, UK

https://orcid.org/0000-0001-9584-8044

Abstract

To limit disclosures, statistical agencies and other data disseminators can release partially synthetic, public use microdata sets. These comprise the units originally surveyed; but some collected values, for example, sensitive values at high risk of disclosure or values of key identifiers, are replaced with multiple draws from statistical models. Because the original records are on the file, there remain risks of identifications. In this paper, we describe how to evaluate identification disclosure risks in partially synthetic data, accounting for released information from the multiple datasets, the model used to generate synthetic values, and the approach used to select values to synthesize. We illustrate the computations using the Survey of Youths in Custody.

How to Cite

Reiter, Jerome P., and Robin Mitra. 2009. “Estimating Risks of Identification Disclosure in Partially Synthetic Data”. Journal of Privacy and Confidentiality 1 (1). https://doi.org/10.29012/jpc.v1i1.567.

Issue

Vol. 1 No. 1 (2009): Inaugural Issue

Section

Articles

Copyright is retained by the authors. By submitting to this journal, the author(s) license the article under the Creative Commons License – Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), unless choosing a more lenient license (for instance, public domain). For situations not allowed under CC BY-NC-ND, short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source.

Authors of articles published by the journal grant the journal the right to store the articles in its databases for an unlimited period of time and to distribute and reproduce the articles electronically.

Article Sidebar

Main Article Content

Abstract

Article Details

Most read articles by the same author(s)