Re: summarizing long articles?




> Are there any PHP scripts that can take a long article and summarize it?

This is certainly possible: one of the tools in MS Word is
"Autosummarize" which will take a body of text and compress it down to
a given target (ie 10 sentences, 25% of the original size, 100 words or
less, etc etc) The results are pretty variable as the algorithm doesn't
seem to select whole sentences, but selects headings and the first few
words from a sample of sentences, in the hope that the document is
well-structured. A good bet is copying the first paragraph as a good
document should have a precis there anyway.

Simple algorithms extract interesting/relevant keywords from a document
by comparing it with other documents and picking out the words that
occur more often in that document than in the others. You need quite a
large body of documents to make this work but if your problem is to
categorise a large number of documents without reading each one then it
is a valuable technique.

I have some PHP scripts that use this method on usenet posts to get the
gist of the topic being discussed in each thread, if you are
interested.

For other discussions try these on Google: automatic summarization,
automatic summarisation, automatic abridgement

---
Steve

.