Handling Outliers

[!question] How to find Outliers?

  1. Using standard deviation if the data is normally distributed
  1. Using Z-score
  2. Using Box Plot
  3. Using Interquartile Range (IQR)
  4. Using Quintile or Percentile
  5. Algorithms to detect outliers

[!def] Machine Learning algorithms Sensitive to outliers

  1. Linear Regression
  2. Logistic Regression
  3. Support Vector Machine (SVM) (Hard Margin)
  4. K-nearest Neighbor (KNN)
  5. K-means Clustering
  6. Hierarchical Clustering
  7. Principal Component Analysis (PCA)

[!def] Machine Learning algorithms NOT Sensitive to outliers

  1. Decision Tree
  2. Random Forest
  3. Support Vector Machine (SVM) (Soft Margin in SVM)
  4. XGBoost
  5. AdaBoost
  6. Naive Bayes

Todo:

  1. how the algorithms handle outliers