20

I am reading a presentation and it recommends not using leave one out encoding, but it is okay with one hot encoding. I thought they both were the same. Can anyone describe what the differences between them are?

Ethan
  • 1,657
  • 9
  • 25
  • 39
icm
  • 539
  • 2
  • 5
  • 9

1 Answers1

20

They are probably using "leave one out encoding" to refer to Owen Zhang's strategy.

From here

The encoded column is not a conventional dummy variable, but instead is the mean response over all rows for this categorical level, excluding the row itself. This gives you the advantage of having a one-column representation of the categorical while avoiding direct response leakage

This picture expresses the idea well. enter image description here

Zephyr
  • 997
  • 4
  • 11
  • 20
dggg
  • 386
  • 1
  • 4