This would be really cool. Can you suggest a way, given
two essays, to decide if, semantically, the first should
point to the second? If an author could use a tool like
that it would mean that links could be discovered rather
than inserted by hand.
That would be valuable ... you've made me think ... I may
have a way of doing something close enough.
Gregor Heinrich has a good paper on Latent Dirichlet Allocation, which I believe is an extension of Latent Semantic Analysis. It is a model which can be used to group documents based on semantic content. He gives the mathematical details and the "punchlines" for implementation. The model takes as input a collection of documents and outputs a topic label for each word in each document. The documents can be plotted in K-dimensional space, where K is the number of possible topics, by using the proportion of each topic in a document as its coordinates. Documents which are closer to each other have topics more similar to each other. You could then use your favorite clustering algorithm or a simple distance threshold to decide which should documents should link to each other.
That would be valuable ... you've made me think ... I may have a way of doing something close enough.
Hmm.