Multiple attributes under shuffled differential privacy

Question

Notation: eps_c (epsilon central), eps_l (epsilon local), n (number of users), d (number of attributes). A single attribute A_i may have |A_i|=r different values for i in [1,d].

Let's suppose each user holds d discrete attributes (e.g., A_1 = Gender with 2 values, A_2 = Age ranges with r_2 values, ..., A_d = Hours online with 24 values).

Some papers like [1], provide amplification for the generalized randomized response model (a.k.a. k-RR) if eps_c <=1. That is, the central privacy must be equal or less to 1 such that eps_l (local) to be amplified.

If we want to collect many attributes (d>1):

Can we set eps_split = eps_c / d <= 1? This would imply a 'total' central privacy greater than 1 depending on the values of eps_split and d.
Or, is it strictly necessary that for a population n, that eps_c = d*eps_split <=1? This would imply that no matter how many attributes you want to collect, the amplification only holds if the central privacy guarantee remains below or equal to 1.

In the same sense, if only (2) holds, if instead of splitting eps_c/d for each attribute, we randomly sample a single attribute per user and spend the whole eps_c on it. Does the amplification use n (the total number of users assuming a totally random sampling technique) or n/d (the number of users answering each attribute)?

[1] Balle, B., Bell, J., Gascón, A. and Nissim, K., 2019, August. The privacy blanket of the shuffle model. In Annual International Cryptology Conference (pp. 638-667). Springer, Cham.

Thank you for your time and help on this subject.

score 2 · Accepted Answer · edited May 07 '21 at 12:28

The theorem in the paper you refer to only assumes that $\varepsilon<1$ because it simplifies the analysis & the proof — amplification happens regardless of the value of $\varepsilon$. If you look closer at the proof (page 10), you can find a tighter formula: the result holds for any $\varepsilon$ and $\delta$ such that $\mathbb{P}\left[\frac{N_1}{N_2}\geq e^\varepsilon\right]\le\delta$, where $N_1$ and $N_2$ are independent variables sampled from $\text{Bin}\left(n-1,\frac{\gamma}{k}\right)$.

If, instead of bounding that quantity using Chernoff bound and assumptions on $\varepsilon$, you estimate it numerically, you obtain much smaller values of $\varepsilon$ and $\delta$, which show that amplification happens regardless of the value of $\varepsilon$. In a paper I co-authored, we do exactly that; the problem is different but the math ends up being the same. This is Theorem 3, you can see the comparison between the closed-form formula and the numeric estimation on Figure 2.

Multiple attributes under shuffled differential privacy

1 Answers1