K-nearest Neighbor (Knn)

  • KNN is a Supervised Learning
  • AKA lazy learner
    • as it doesn't learn anything from the data
    • But memorize the whole train data
    • and classify a new point
  • Steps
    1. Start with a labeled data
    2. For a new data
      1. Find K-nearest neighbors
      2. Assign to the major voted ones
  • Assign K to a odd number to avoid ties
    • If still ties, take Random neighbor or disregard the data
  • Find optimal K using dev data
    • Too Low values of K is noisy and susceptible to outliers
    • Too Large values of K can always classify as the most data ones
  • Distance Metric: