Random Forest
Steps
- Create
S
different dataset from the raw dataset,len(s) = B
- They will be slightly different as they are taken random with replacement
- On each dataset take random subset of features rather than all features
- Learn
S
different decision trees - Combine them for prediction
- For regression, take the average
- For classification, take the majority vote
[!question] What is the advantages of Random Forest?
- As random sample is done, we reduce the effect of noise, outliers, and imbalanced dataset
- So Random forest is good at Handling Missing Data, Handling Outliers, Handling Imbalanced Dataset
- But bad for Interpretability which is far better in Decision Tree