0

So I'm trying to split on an attribute "Color" that has possible values (Blue,Green,Red,Orange,Pink).

I'm splitting on entropy values, and the best split can either be Multi-Way 5, Multi-Way 4, Multi-Way 3, or Binary. For example:

5: (Blue, Green,Red,Orange,Pink)

4: (Blue, Green), (Red), (Orange), (Pink)
   (Green,Pink), (Blue),(Red),(Orange)

3: (Red,Orange), (Blue,Green), (Pink)
   (Red,Blue), (Green, Orange), (Pink)

2: (Blue,Green,Red), (Orange,Pink)
   (Pink), (Blue, Green, Red, Orange)

And so on. But how can I make a comprehensive list of all the possible splits? Is there a specific algorithm I could use? Or how would I even know how many max possible combinations there are with this?

Any help would be greatly appreciated, thanks!!!

Raphael
  • 73,212
  • 30
  • 182
  • 400
ocean800
  • 125
  • 4

1 Answers1

2

The entropy of a given attribute is non increasing over refinement of partitions. This means that the best possible partition is singletons (one value per set). This is the method used for discrete attributes in known algorithms such as ID3 or C4.5.

As a side note, the number of partitions is given by Bell's number.

Ariel
  • 13,614
  • 1
  • 22
  • 39