Rank similar or related documents by text analysis
Take a bunch of text files, run it through some program that generates a ‘graph’ of similarity.
How can this be used to bring up not only bidirectional links but undiscovered n-directionality?
Should probably use tf-idf to pick out keywords for a corpus.
This would be especially helpful to discover connections between unrelated notes. Whenever I do reviews of my note inbox I discover related notes, often made on the go or in a hurry, which are similar, sometimes almost the same, but aren’t connected to anything else.
This is a tedious manual process, and it would be great with a tool that could bunch these up and present them to me for review.
I recently briefly discussed this problem space with my brother Patrick Gorton, a computer scientist. It didn’t take many days before he sent me a a rough prototype!
I have yet to test this out in action, but will very soon! See https://github.com/plaffa/text-relations
And reply from him similarity between texts
Also check out http://naturalnode.github.io/natural/string_distance.html
https://github.com/NaturalNode/natural
Some people in the Obsidian community are looking into somehting like this https://forum.obsidian.md/t/find-similar-notes-python-script/9450/8
‘Map of tools for finding related notes in personal knowledge bases (PKB) – in Roam, Obsidian, Notion, etc.’ https://github.com/taivop/pkb-related-notes
‘Even the best current ‘tools for thought’ apps require you to remember to manually make all the connections between your ideas.’ https://twitter.com/thesephist/status/1458218178966790145