Re: best method to perform operations on word lists



Francois Massion schreef:

I have a list of approx 20,000 terms extracted from a database. The
list is sorted alphabetically. The entries look like this:

überzeugt
überzeugt,
überzogen
überzogen,
überzogen.
üblich
übliche
üblichen
üblicherweise

You can first clean it up by removing the punctuations at the end of the
line, and then pipe it through uniq:

perl -ple 's/[.,]$//' infile | uniq > infile-1


I want to eliminate the variants of a basic word. In the example above
I want to end up with:
-überzeugt
-überzogen
-üblich
-üblicherweise

You are bound to loose more than you win.

If you are not in a hurry and have plenty of memory, you can slurp the
whole file in, and then do

1 while s/ \n (.+) \n \1 (?:e|en|t) \n /\n$1\n/x ;

but a while-loop that remembers the previous line is far more efficient.

--
Affijn, Ruud

"Gewoon is een tijger."


.