best method to perform operations on word lists



Hi folks,

I am rather bad at perl and would like some advice on the best
methodology to do the following:

I have a list of approx 20,000 terms extracted from a database. The
list is sorted alphabetically. The entries look like this:

überzeugt
überzeugt,
überzogen
überzogen,
überzogen.
üblich
übliche
üblichen
üblicherweise

I want to eliminate the variants of a basic word. In the example above
I want to end up with:
-überzeugt
-überzogen
-üblich
-üblicherweise

I have thought of the following:

(i) I read the list in a hash made of an index and the term
1 ==> überzeugt
2 ==> überzeugt,
etc.

(ii) I compare each term with its followers

(iii) if the following condition is not met, I delete the entry
(key+value) with "delete"

$term ist a substring of next term AND
the length difference is, say, below 3 (to avoid deleting
"üblicherweise" which is a different term)

I am not sure it is the right methodology. I don't like so much the
idea of creating artificially the index list (1 ==> Term1).

I wonder if I should work with references but it is sort of a blackbox
to me.

Any comments are appreciated.

Francois

.



Relevant Pages

  • Windows IR methodology
    ... Incident Response Methodology ... Pslist.exe, tlist.exe, Perl ... including process-to-port mapping information. ...
    (Incidents)
  • Need advices in choosing approach
    ... I need advice in choosing approach or to be exact the methodology that ...
    (comp.lang.php)
  • Re: Transition from ASIC to FPGA
    ... Excellent advice from Austin Lesea. ... You may also enjoy 'The Art of High ... although this is not necessarily the best methodology if good enough is good ...
    (comp.arch.fpga)