Achieving Both Valid and Secure Logistic Regression Analysis on Aggregated Data from Different Private Sources
Main Article Content
Preserving the privacy of individual databases when carrying out statistical calculations has a relatively long history in statistics and had been the focus of much recent attention in machine learning. In this paper, we present a protocol for fitting a logistic regression when the data are held by separate parties---without actually combining information sources---by exploiting results from the literature on multi-party secure computation. Our protocol provides only the final result of the calculation compared with other methods that share intermediate values and thus present an opportunity for compromise of values in the individual databases. Our paper has two themes: (1) the development of a secure protocol for computing the logistic parameters, and a demonstration of its performances in practice, and (2) the presentation of an amended protocol that speeds up the computation of the logistic function. We illustrate the nature of the calculations and their accuracy using an extract of data from the Current Population Survey divided between two parties. Throughout, we build our protocol from existing cryptographic primitives, thus the novelty is in designing a concrete procedure for private computation of the logistic regression MLE rather than to propose new cryptographic constructions.
Copyright is retained by the authors. By submitting to this journal, the author(s) license the article under the Creative Commons License – Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), unless choosing a more lenient license (for instance, public domain). For situations not allowed under CC BY-NC-ND, short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source.
Authors of articles published by the journal grant the journal the right to store the articles in its databases for an unlimited period of time and to distribute and reproduce the articles electronically.
Grant numbers DAAD19-02-1-3-0389
National Science Foundation
Grant numbers BCS0941518