1

I am about to kick off a large hackathon event.

We have a dataset that is comprised of one continuous variable with high precision, and a number of categorical variables qualifying these data 3-levels deep.

Data provider wants to 'mask' the data such that the original values cannot be reverse-engineered. I'm not worried about the categorical variables, this is simple. But the continuous variables are tricky.

  1. a logarithmic transformation is easily reverse engineered
  2. a nonlinear transformation is better, but will mess with the relationship of values between categories
  3. a pure linear transformation would work, but doesn't seem to 'mask' enough.

I need to preserve the relationships between numbers whilst also protecting the actual, true values.

Ideas greatly appreciated.

HEITZ
  • 911
  • 4
  • 7

1 Answers1

1

I think you can use a much more complicated monotonic transformation, like

log(1.234578 + sqrt(x + 7.4142) ** 3)

which will be harder to invert than a simple log. But, as Nikos says, strictly monotonic functions are invertible, so all you can do is make it very hard to compute the inverse by composing many monotonic functions.

David Masip
  • 6,136
  • 2
  • 28
  • 62