Random Forest


  1. Create S different dataset from the raw dataset, len(s) = B
    1. They will be slightly different as they are taken random with replacement
    2. On each dataset take random subset of features rather than all features
  2. Learn S different decision trees
  3. Combine them for prediction
    1. For regression, take the average
    2. For classification, take the majority vote

[!question] What is the advantages of Random Forest?

  • As random sample is done, we reduce the effect of noise, outliers, and imbalanced dataset

