Understanding Privacy-Utility Tradeoffs in Differentially Private Online Active Learning
Main Article Content
Abstract
We consider privacy-preserving learning in the context of online learning. Insettings where data instances arrive sequentially in streaming fashion, incremental trainingalgorithms such as stochastic gradient descent (SGD) can be used to learn and updateprediction models. When labels are costly to acquire, active learning methods can beused to select samples to be labeled from a stream of unlabeled data. These labeled datasamples are then used to update the machine learning models. Privacy-preserving onlinelearning can be used to update predictors on data streams containing sensitive information.The differential privacy framework quantifies the privacy risk in such settings. This workproposes a differentially private online active learning algorithm using stochastic gradientdescent (SGD) to retrain the classifiers. We propose two methods for selecting informativesamples. We incorporated this into a general-purpose web application that allows a non-expert user to evaluate the privacy-aware classifier and visualize key privacy-utility tradeoffs.Our application supports linear support vector machines and logistic regression and enablesan analyst to configure and visualize the effect of using differentially private online activelearning versus a non-private counterpart. The application is useful for comparing theprivacy/utility tradeoff of different algorithms, which can be useful to decision makers inchoosing which algorithms and parameters to use. Additionally, we use the application toevaluate our SGD-based solution and to show that it generates predictions with a superiorprivacy-utility tradeoff than earlier methods.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Copyright is retained by the authors. By submitting to this journal, the author(s) license the article under the Creative Commons License – Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), unless choosing a more lenient license (for instance, public domain). For situations not allowed under CC BY-NC-ND, short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source.
Authors of articles published by the journal grant the journal the right to store the articles in its databases for an unlimited period of time and to distribute and reproduce the articles electronically.