1

I am trying to build a solution for the following problem and I think it is relevant to crypto:

Party A has a set of indexes M, such as a group of people's ID; Party B has information of a larger group of people, and we name the set of this group as N. Let's assume M is a subset of N. Party B agrees to provide information of people whoever requested by A for a business purpose. Although party A wants to have all the information of M from B, it does not want to reveal any ID in M to Party B during the process.

Does anyone know if this is possible?

Thanks.

cypherfox
  • 1,442
  • 8
  • 16
S. Luo
  • 11
  • 2

2 Answers2

1

You're looking for PIR, Private Information Retrieval or OT, Oblivious Transfer and possibly also PSI, Private Set Intersection.

PIR enables a client to query a server for the $i$-th entry of a database without the server learning the query. However, the client may learn values for other indices too.

OT is a stronger PIR without this leak.

PSI enables a client to find the intersection of their secret set and the server's secret set. You might want to do PSI before PIR to filter out non-members before trying to look them up, to reduce communication costs. Possibly of interest RePSI, Reactive PSI reduces the ability for a malicious client to enumerate the server's set.


A much weaker alternative may be of interest too. If the database is large, but reasonably partitioned and the client is willing to reveal which region they are querying, then you can use k-anonymity.

Ex. Have I Been Pwned's password lookup works by taking the first 5 hex characters of the hash of your password to identify the partition of the large password hash database. Then, you filter the smaller but still large partition down to your exact hash/key and read its value.

Although the server does not learn the user's password, related queries can uniquely identify the user if they have multiple compromised passwords tied to a common username or email address. I.e. There may be only one user whose two password hashes start with "12345" and "abcde".

If the client searches for the information on "Bobby Smith" and "Carol Smith" and these two people exist in separate partitions, then the server may link the two queries to recover your query. Even if all queries are independent without any relation, k-anonymity still reveals the tiny subset of the large database, the client is interested in.

cypherfox
  • 1,442
  • 8
  • 16
0

A good approach I propose for this problem is the following:

  1. Part A requests all the content from part B (e.g. a json file);
  2. Once Part A is in possession of N, then part A can filter and take a smaller subset from N;
  3. Finally, Part A just has to put the filtered subset into M.
Maf
  • 271
  • 2
  • 10