Rank similar or related documents by text analysis

Take a bunch of text files, run it through some program that generates a ‘graph’ of similarity.

How can this be used to bring up not only bidirectional links but undiscovered n-directionality?

Should probably use tf-idf to pick out keywords for a corpus.

This would be especially helpful to discover connections between unrelated notes. Whenever I do reviews of my note inbox I discover related notes, often made on the go or in a hurry, which are similar, sometimes almost the same, but aren’t connected to anything else.

This is a tedious manual process, and it would be great with a tool that could bunch these up and present them to me for review.


I recently briefly discussed this problem space with my brother Patrick Gorton, a computer scientist. It didn’t take many days before he sent me a a rough prototype!

I have yet to test this out in action, but will very soon! See https://github.com/plaffa/text-relations

And reply from him similarity between texts


Also check out http://naturalnode.github.io/natural/string_distance.html

https://github.com/NaturalNode/natural

Some people in the Obsidian community are looking into somehting like this https://forum.obsidian.md/t/find-similar-notes-python-script/9450/8

‘Map of tools for finding related notes in personal knowledge bases (PKB) – in Roam, Obsidian, Notion, etc.’ https://github.com/taivop/pkb-related-notes

‘Even the best current ‘tools for thought’ apps require you to remember to manually make all the connections between your ideas.’ https://twitter.com/thesephist/status/1458218178966790145

See NLP (Natural language processing)