3

I have studied Random Forest and RainForest papers, but they are a bit confusing! In summary, I understand following steps for these algorithms. Could you help me to find out if I am right or not?

I appreciate your help.

In Random forest first:

  1. define number of trees
  2. partition data by bootstrapping
  3. on each partition construct trees (in each node a sub sample of features is selected)
  4. label leaf nodes
  5. for classifying a new instance vote over all trees.

In RainForest:

  1. Partition dataset
  2. Build AVC-set of a partition
  3. Build tree over the partition by computing a purity criterion (like gini-index) over AVC-sets

1 Answers1

2

Random forest is a learning algorithm. It is an ensemble learning algorithm that uses decision trees as base learners. You wrote the steps for it correctly.

Rain forest is not a learning algorithm. It is an algorithm of constructing a decision tree (how to do splitting) when the dataset is so large that it does not fit the memory. In rain forest, the whole dataset is not required for making a splitting decision. Only some aggregated information (AVC-set for an attribute or AVC-group if you have more memory) is required.

If your dataset is large, and memory is small, you can use rain forest to build several different decision trees. Then use random forest algorithm with those trees as base learners.

Vladislav Gladkikh
  • 1,206
  • 11
  • 21