5

The $k$-anonymization paradigm (and its refinements) means to create datasets where every tuple is identical with $k-1$ others.

However I'm in a situation where people are in the dataset many times. And I want to follow their progress through the health care system, so I need to know who is who. If I give each person a unique ID, which is necessary in this situation, a linking attack from within the table is possible!

Does anyone know of any relevant theory or have attempted to deal with similar problems?

I'm inclined to think it is impossible to give any good guarantee of anonymity in this situation.

This will possibly be used for my MSc thesis topic.

Gilles 'SO- stop being evil'
  • 44,159
  • 8
  • 120
  • 184
The Unfun Cat
  • 1,803
  • 2
  • 19
  • 29

1 Answers1

2

The point of k-anonymity is that you can't uniquely identify your patients. So I will rephrase your question:

Given two anonymized tuples $x$ and $y$, can we tell if they are anonymizations of the same person?

Let's suppose for purposes of contradiction that we could. Then this means there is a "meta-tuple" which uniquely identifies a patient. But this violates anonymity (unless $k=1$). So it is impossible.

Xodarap
  • 1,538
  • 1
  • 10
  • 17