An Evaluation Framework for Privacy-Preserving Record Linkage
Main Article Content
Abstract
Privacy-preserving record linkage (PPRL) addresses the problem of identifying matching records from different databases that correspond to the same real-world entities using quasi-identifying attributes (in the absence of unique entity identifiers), while preserving privacy of these entities. Privacy is being preserved by not revealing any information that could be used to infer the actual values about the records that are not reconciled to the same entity (non-matches), and any confidential or sensitive information (that is not agreed upon by the data custodians) about the records that were reconciled to the same entity (matches) during or after the linkage process. The PPRL process often involves three main challenges, which are scalability to large databases, high linkage quality in the presence of data quality errors, and sufficient privacy guarantees. While many solutions have been developed for the PPRL problem over the past two decades, an evaluation and comparison framework of PPRL solutions with standard numerical measures defined for all three properties (scalability, linkage quality, and privacy) of PPRL has so far not been presented in the literature. We propose a general framework with normalized measures to practically evaluate and compare PPRL solutions in the face of linkage attack methods that are based on an external global dataset. We conducted experiments of several existing PPRL solutions on real-world databases using our proposed evaluation framework, and the results show that our framework provides an extensive and comparative evaluation of PPRL solutions in terms of the three properties.
Article Details
Copyright is retained by the authors. By submitting to this journal, the author(s) license the article under the Creative Commons License – Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), unless choosing a more lenient license (for instance, public domain). For situations not allowed under CC BY-NC-ND, short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source.
Authors of articles published by the journal grant the journal the right to store the articles in its databases for an unlimited period of time and to distribute and reproduce the articles electronically.