communication efficient private lookup

Question

I have a scenario where a client has a private set and the server has a large public set, we would like an estimate of the intersection size or more specifically the percent of the private set covered by the public set.

However we do not want to send anywhere near the entire public set over the wire, even a bloom filter would seem rather large.

It's OK for the server to discover the size of intersection but should not gain much about specific keys the client has.

Is this possible? With priorities being minimizing communication, client compute, server compute in that order.

score 3 · Answer 1 · answered Aug 05 '18 at 15:36

Here are some recent papers that study optimizing PSI for the case of significantly unequal set sizes. In all of these papers the primary goal is to reduce communication.

These two papers save on communication by having an offline phase with high communication. When it comes time to actually compute the intersection, the online communication is low:

Unbalanced Approximate Private Set Intersection; Amanda Cristina Davi Resende, Diego F. Aranha
Private Set Intersection for Unequal Set Sizes with Mobile Applications; Ágnes Kiss, Jian Liu, Thomas Schneider, N. Asokan, Benny Pinkas

This paper saves on communication by assuming that the server's large set is held by two (or more) noncolluding parties. That way, it is possible to leverage techniques of private information retrieval:

PIR-PSI: Scaling Private Contact Discovery; Daniel Demmler, Peter Rindal, Mike Rosulek, Ni Trieu

You could argue that the previous papers are "cheating" in the sense that they are not in the standard 2-party setting with a one-off computation.

There are a few papers that achieve low communication without such "cheating", and these all use expensive public-key crypto operations (proportional to the size of the large set). The most efficient one that I know of is this, which uses somewhat homomorphic encryption:

Fast Private Set Intersection from Homomorphic Encryption; Hao Chen, Kim Laine, Peter Rindal

I happen to know a followup to this work will appear at CCS 2018, but it is not available online yet.

You cannot avoid having the protocol "touch" every part of the large set, so the server cost will always be somewhat high. In PIR-PSI the cost is some number of very cheap AES operations for each item in the large set. Hence this one is the fastest that I know about (if you include the cost of the offline pre-computation in other work). The FHE-based scheme is surprisingly fast though a little slower than PIR-PSI as far as I know.

The state of the art for all of these supports 10-100 million server items and 500-5000 client items, with costs on the order of a few seconds and maybe 5-20 MB of communication.

score 0 · Answer 2 · answered Aug 06 '18 at 06:24

I have come up for the following solution which would work when datasets are not of wildly different sizes and we only want to estimate intersection size. We can limit ourselves to a random subset, so instead of sharing the full public dataset it is sufficient to share only a sufficiently large fragment which can be shared in the form of a bloom filter. So if the client has m records we could request say a 20/m fragment defined by the last bits of a hash. The server will send a bloom filter for that fragment (possibly pre-computed) we can again choose say 20 bits per item in the filter and get a reasonable FPR.

Haven't done the exact math for expected error in estimating intersection size but my back of the napkin suggest with 20/m fragment and 20 bits per item I will get in the area of 10% error rate in estimating intersection size.

So if a client has m items and server has n>m items, we could with 400*n/m bits transferred and constant very little computation get such an estimate.

The server is expected to get some information on the size of the client dataset by virtue of the fragment size requested but nothing more.

Not much crypto in my answer, but I think it will do, except when n is much much bigger than m.

score 0 · Answer 3 · answered Aug 06 '18 at 06:40

off-topic answer: In an application level, if you are comfortable with Intel SGX, you can design oblivious algorithms for lookup and write an application in SGX to achieve it. It would be fast, though.

Yet, this StackExchange is for crypto... so I label this one as off-topic.

communication efficient private lookup

3 Answers3