Re: best method to perform operations on word lists




Bart Van der Donck schrieb:

Francois Massion wrote:

Well, the issue is really a matter of pragmatism. If I do the work
manually or with some VBA macros it will take ages. The situation I am
trying to address is not so uncommon to people working on glossary
issues. Therefore I am trying to find a language-independant solution
which works, say, for 90% of the words.

If you can afford such an error margin, here is a brute approach:

#!perl
use strict; use warnings;
my $list =
"überzeugt
überzeugt,
überzogen
überzogen,
überzogen.
üblich
übliche
üblichen
üblicherweise";
my @terms = split /\n/, $list;
my $prev = 'nonesuch584685542256RANOM58544';

This didn't modify the list. Maybe the reason is the $prev definition.


s/(\.|,|e|en|e,|en,|e\.|en\.)$// for @terms;

I also tried Dr. Ruud's regex but it would have to be rewritten for
each language. Here a Polish list example:
zeliwa
zeliwa.
zeliwa,
zeliwna
zeliwnej
zeliwny
zeliwo
zelu
zurawia
Zurawie
zuraw


@terms = grep($_ ne $prev && ($prev = $_), sort @terms);
print $_."\n" for @terms;

FWIW,

--
Bart

.