best method to perform operations on word lists
- From: "Francois Massion" <massion@xxxxxx>
- Date: 10 Jun 2006 00:10:28 -0700
Hi folks,
I am rather bad at perl and would like some advice on the best
methodology to do the following:
I have a list of approx 20,000 terms extracted from a database. The
list is sorted alphabetically. The entries look like this:
überzeugt
überzeugt,
überzogen
überzogen,
überzogen.
üblich
übliche
üblichen
üblicherweise
I want to eliminate the variants of a basic word. In the example above
I want to end up with:
-überzeugt
-überzogen
-üblich
-üblicherweise
I have thought of the following:
(i) I read the list in a hash made of an index and the term
1 ==> überzeugt
2 ==> überzeugt,
etc.
(ii) I compare each term with its followers
(iii) if the following condition is not met, I delete the entry
(key+value) with "delete"
$term ist a substring of next term AND
the length difference is, say, below 3 (to avoid deleting
"üblicherweise" which is a different term)
I am not sure it is the right methodology. I don't like so much the
idea of creating artificially the index list (1 ==> Term1).
I wonder if I should work with references but it is sort of a blackbox
to me.
Any comments are appreciated.
Francois
.
- Follow-Ups:
- Re: best method to perform operations on word lists
- From: Dr.Ruud
- Re: best method to perform operations on word lists
- From: David Squire
- Re: best method to perform operations on word lists
- Prev by Date: FAQ 2.2 How can I get a binary version of perl?
- Next by Date: Re: GIFS not working properly in JavaScript PopUps
- Previous by thread: FAQ 2.2 How can I get a binary version of perl?
- Next by thread: Re: best method to perform operations on word lists
- Index(es):
Relevant Pages
|