tf–idf

term frequency–inverse document frequency is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used as a weighting factor in searches of information retrieval, text mining, and user modeling. The tf–idf value increases proportionally to the number of times a word appears in the document and is offset by the number of documents in the corpus that contain the word, which helps to adjust for the fact that some words appear more frequently in general.’

From Wikipedia via NaturalNode documentation

  • How many times does a word appear in a document?
  • How many documents in the given collection (corpus) contains the word?
    • This gives an indication of how common this word is

See NLP (Natural language processing)

#review #definition