Practical Data Synthesis for Large Samples

Main Article Content

Gillian M Raab
Beata Nowok
https://orcid.org/0000-0002-0713-2271
Chris Dibben
https://orcid.org/0000-0003-1769-3774

Abstract

We describe results on the creation and use of synthetic data that were derived in the context of a project to make synthetic extracts available for users of the UK Longitudinal Studies. A critical review of existing methods of inference from large synthetic data sets is presented. We introduce new variance estimates for use with large samples of completely synthesised data that do not require them to be generated from the posterior predictive distribution derived from the observed data and can be used with a single synthetic data set. We make recommendations on how to synthesise data based on these results. The practical consequences of these results are illustrated with an example from the Scottish Longitudinal Study.

Article Details

How to Cite
Raab, Gillian, Beata Nowok, and Chris Dibben. 2018. “Practical Data Synthesis for Large Samples”. Journal of Privacy and Confidentiality 7 (3), 67-97. https://doi.org/10.29012/jpc.v7i3.407.
Section
Articles

Funding data