Re: Automatic text tagging



Bruno Barberi Gnecco wrote:
On Apr 8, 10:05 pm, Bruno Barberi Gnecco
<brunobgDELETET...@xxxxxxxxxxxxxxxxxxxxx> wrote:

I need to implement an automatic text tagging system. Any suggestions
of algorithms? I've used Bayesian classification with great success when the
categories are fixed and in small number, but in the case of tags I believe
it won't work very well (too few items per tag to train well). I'm also looking
for something more sophisticated than simply finding tags in text.

Any pointers to papers, books or code is appreciated. Thanks a lot.


You mean Part-Of-Speech tags (Noun, Verb, etc.)?

But these *are* a "fixed and small number of categories", are not
they?

For a small training set a very successful technique is to take into
account the context, namely the few words to the left and to the right
of the word under tagging. Work with probabilities. In an enlarged
context many times there are choices with probability 1 (e.g. words
"the", "at"). These "ground" choices help chose the others.


No, I mean tags as they're used in many websites nowadays,
describing what the text is about. For example, this message could be
tagged "text mining, tag, probabilities".

It's not quite ready for prime time, but take a look at http://openpipeline.org. The code will be ready for release in a week or two.

It's not a solution to the problem, but it is a nice framework for plugging in a solution.

Try googling "entity extraction" for useful links.
.



Relevant Pages

  • Re: Automatic text tagging
    ... I've used Bayesian classification with great success when the ... for something more sophisticated than simply finding tags in text. ... account the context, namely the few words to the left and to the right ... Work with probabilities. ...
    (comp.theory)
  • Re: Automatic text tagging
    ... I've used Bayesian classification with great success when the ... for something more sophisticated than simply finding tags in text. ... account the context, namely the few words to the left and to the right ... tagged "text mining, tag, probabilities". ...
    (comp.theory)
  • Re: Retailers, Naughty or Nice
    ... > I don't even pay attention. ... I throw the tags away when I get home, ... crap it's not worth keeping track of the paper. ... The road to success is lined with many tempting parking spaces. ...
    (alt.marketing.online.ebay)
  • Automatic text tagging
    ... of algorithms? ... I've used Bayesian classification with great success when the ... for something more sophisticated than simply finding tags in text. ...
    (comp.theory)