Noise Multiplication for Statistical Disclosure Control of Extreme Values in Log-normal Regression Samples

Martin Klein; Thomas Mathew; Bimal Sinha

doi:10.29012/jpc.v6i1.637

PDF

Published: Jun 1, 2014

DOI: https://doi.org/10.29012/jpc.v6i1.637

Keywords:

Disclosure risk evaluation, EM algorithm, Maximum likelihood estimator, Partially synthetic data, Tobit regression, Top coding, Tuning parameter

Martin Klein

Center for Statistical Research and Methodology, U.S. Census Bureau, Washington, DC

Thomas Mathew

Center for Statistical Research and Methodology, U.S. Census Bureau, Washington, DC

Bimal Sinha

Center for Disclosure Avoidance Research, U.S. Census Bureau, Washington, DC, USA, and De- partment of Mathematics and Statistics, University of Maryland Baltimore County, Baltimore, MD

Abstract

In this article multiplication of original data values by random noise is suggested as a disclosure control strategy when only the top part of the data is sensitive, as is often the case with income data. The proposed method can serve as an alternative to top coding which is a standard method in this context. Because the log-normal distribution usually fits income data well, the present investigation focuses exclusively on the log-normal. It is assumed that the log-scale mean of the sensitive variable is described by a linear regression on a set of non-sensitive covariates, and we show how a data user can draw valid inference on the parameters of the regression. An appealing feature of noise multiplication is the presence of an explicit tuning mechanism, namely, the noise generating distribution. By appropriately choosing this distribution, one can control the accuracy of inferences and the level of disclosure protection desired in the released data. Usually, more information is retained on the top part of the data under noise multiplication than under top coding. Likelihood based analysis is developed when only the large values in the data set are noise multiplied, under the assumption that the original data form a sample from a log-normal distribution. In this scenario, data analysis methods are developed under two types of data releases: (I) each released value includes an indicator of whether or not it has been noise multiplied, and (II) no such indicator is provided. A simulation study is carried out to assess the accuracy of inference for some parameters of interest. Since top coding and synthetic data methods are already available as disclosure control strategies for extreme values, some comparisons with the proposed method are made through a simulation study. The results are illustrated with a data analysis example based on 2000 U.S. Current Population Survey data. Furthermore, a disclosure risk evaluation of the proposed methodology is presented in the context of the Current Population Survey data example, and the disclosure risk of the proposed noise multiplication method is compared with the disclosure risk of synthetic data.

How to Cite

Klein, Martin, Thomas Mathew, and Bimal Sinha. 2014. “Noise Multiplication for Statistical Disclosure Control of Extreme Values in Log-Normal Regression Samples”. Journal of Privacy and Confidentiality 6 (1). https://doi.org/10.29012/jpc.v6i1.637.

Issue

Vol. 6 No. 1 (2014)

Section

Articles

Copyright is retained by the authors. By submitting to this journal, the author(s) license the article under the Creative Commons License – Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), unless choosing a more lenient license (for instance, public domain). For situations not allowed under CC BY-NC-ND, short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source.

Authors of articles published by the journal grant the journal the right to store the articles in its databases for an unlimited period of time and to distribute and reproduce the articles electronically.

Article Sidebar

Main Article Content

Abstract

Article Details

Most read articles by the same author(s)