Handling Missing Data

[!question] How to handle Missing Data?

  1. Data Imputation
  2. Mean
  3. Median
  4. Mode
  5. Replace with most co-related data Finding Co-relation between two data or distribution
  6. Assign new category i.e. unknown (For categorical values)
  7. Interpolation (Time series data)
  8. Use K-nearest Neighbor (KNN) to interpolate data
  9. Deletion
  10. Remove the column (feature)
  11. Remove the row (data point)
  12. Use algorithms that work with missing data
  13. K-nearest Neighbor (KNN)
  14. Naive Bayes
  15. XGBoost
  16. Random Forest
  17. Train a model to replace missing data
  • Imputation works if the number of missing points is less

TODO:

  1. Learn how the algorithms handle missing data