I am wondering why (not how) we relax $\epsilon$-differential privacy to $(\epsilon, \delta)$-differential privacy. Is the main motivation to reduce the variance of the noise added to the query with a slight sacrifice from the strength of privacy?
Asked
Active
Viewed 867 times
1 Answers
3
I help implement and ship anonymization strategies based on differential privacy in a large tech company. In my experience, the $\delta$ is mainly used for two reasons.
- Partition selection: when computing histograms on an unbounded space, you can threshold the results (or do smarter things) and, at the cost of a non-zero $\delta$, avoid having to specify the full list of partitions in advance>.
- Gaussian noise: since Gaussian is based on the $L_2$ sensitivity, it is very convenient to use when adding noise to a lot of metrics at once; if a single user can influence $k$ metrics, the noise needs to be scaled by $\sqrt{k}$ instead of $k$ with Laplace noise$^1$. But Gaussian noise doesn't give you pure $\varepsilon$-DP, you have to have a non-zero $\delta$.
Gaussian noise is particularly used for machine learning use cases. In such contexts, you also often want to use results on amplification by sampling, which also require a non-zero $\delta$.
Ted
- 1,028
- 5
- 21