Handling Missing Data
[!question] How to handle Missing Data?
- Data Imputation
- Mean
- Median
- Mode
- Replace with most co-related data Finding Co-relation between two data or distribution
- Assign new category i.e. unknown (For categorical values)
- Interpolation (Time series data)
- Use K-nearest Neighbor (KNN) to interpolate data
- Deletion
- Remove the column (feature)
- Remove the row (data point)
- Use algorithms that work with missing data
- K-nearest Neighbor (KNN)
- Naive Bayes
- XGBoost
- Random Forest
- Train a model to replace missing data
- Imputation works if the number of missing points is less
TODO:
- Learn how the algorithms handle missing data