TF-IDF
- TF-IDF is used to find the importance of a word in multiple documents
- TF = Term Frequency
- number of times the word is in a document
- IDF = Inverse Document Frequency
- how relevant that term is across all documents
- TF-IDF is the product of TF and IDF
[!def] TF-IDF
$$
TF-IDF(w) = \frac{count_{word ; w}\text{ in a doc }}{\text{total # of words in a doc}} log \frac{\text{# of documents with word w}}{\text{Total # of docs}}
$$
- TF-IDF can be used as Word Embedding also, by replacing $1$ in one-hot vector by the TF-IDF score.