Re: Automatic text tagging
- From: Chris <spam_me_not@xxxxxxxxxx>
- Date: Sat, 12 Apr 2008 15:27:30 -0500
Bruno Barberi Gnecco wrote:
On Apr 8, 10:05 pm, Bruno Barberi Gnecco
<brunobgDELETET...@xxxxxxxxxxxxxxxxxxxxx> wrote:
I need to implement an automatic text tagging system. Any suggestions
of algorithms? I've used Bayesian classification with great success when the
categories are fixed and in small number, but in the case of tags I believe
it won't work very well (too few items per tag to train well). I'm also looking
for something more sophisticated than simply finding tags in text.
Any pointers to papers, books or code is appreciated. Thanks a lot.
You mean Part-Of-Speech tags (Noun, Verb, etc.)?
But these *are* a "fixed and small number of categories", are not
they?
For a small training set a very successful technique is to take into
account the context, namely the few words to the left and to the right
of the word under tagging. Work with probabilities. In an enlarged
context many times there are choices with probability 1 (e.g. words
"the", "at"). These "ground" choices help chose the others.
No, I mean tags as they're used in many websites nowadays,
describing what the text is about. For example, this message could be
tagged "text mining, tag, probabilities".
It's not quite ready for prime time, but take a look at http://openpipeline.org. The code will be ready for release in a week or two.
It's not a solution to the problem, but it is a nice framework for plugging in a solution.
Try googling "entity extraction" for useful links.
.
- References:
- Automatic text tagging
- From: Bruno Barberi Gnecco
- Re: Automatic text tagging
- From: amado . alves
- Re: Automatic text tagging
- From: Bruno Barberi Gnecco
- Automatic text tagging
- Prev by Date: Re: Algorithm for inserting numbers in a list?
- Next by Date: Re: Another approach to Decide Solvability of Univariate Integer Polynomials, and a possible Multivariate Extension
- Previous by thread: Re: Automatic text tagging
- Next by thread: Re: Automatic text tagging
- Index(es):
Relevant Pages
|