Decision Tree (Classification)


  1. Try with each remaining feature and find out impurity for each candidate tree
    1. There are multiple ways to calculate impurity
    2. If the feature is categorical, just use the labels for branch
    3. If the feature is continuous,
      1. Sort data based on that column
      2. For each pair of data, find the mean
      3. And get the node defined on that mean, i.e., age %3C 7, here 7 is mean of two consecutive rows
  2. Take the candidate tree with lowest impurity
  3. If there is no impure node left or the pre-defined depth limit reached, STOP.
  4. Go to step 1


  1. Overfitting
  2. Missing data - Follow guides on Handling Missing Data