Re: best method to perform operations on word lists
- From: "Francois Massion" <massion@xxxxxx>
- Date: 10 Jun 2006 10:02:55 -0700
Bart Van der Donck schrieb:
Francois Massion wrote:
Well, the issue is really a matter of pragmatism. If I do the work
manually or with some VBA macros it will take ages. The situation I am
trying to address is not so uncommon to people working on glossary
issues. Therefore I am trying to find a language-independant solution
which works, say, for 90% of the words.
If you can afford such an error margin, here is a brute approach:
#!perl
use strict; use warnings;
my $list =
"überzeugt
überzeugt,
überzogen
überzogen,
überzogen.
üblich
übliche
üblichen
üblicherweise";
my @terms = split /\n/, $list;
my $prev = 'nonesuch584685542256RANOM58544';
This didn't modify the list. Maybe the reason is the $prev definition.
s/(\.|,|e|en|e,|en,|e\.|en\.)$// for @terms;
I also tried Dr. Ruud's regex but it would have to be rewritten for
each language. Here a Polish list example:
zeliwa
zeliwa.
zeliwa,
zeliwna
zeliwnej
zeliwny
zeliwo
zelu
zurawia
Zurawie
zuraw
@terms = grep($_ ne $prev && ($prev = $_), sort @terms);
print $_."\n" for @terms;
FWIW,
--
Bart
.
- Follow-Ups:
- Re: best method to perform operations on word lists
- From: Bart Van der Donck
- Re: best method to perform operations on word lists
- References:
- best method to perform operations on word lists
- From: Francois Massion
- Re: best method to perform operations on word lists
- From: Dr.Ruud
- Re: best method to perform operations on word lists
- From: Francois Massion
- Re: best method to perform operations on word lists
- From: Bart Van der Donck
- Re: best method to perform operations on word lists
- From: Francois Massion
- Re: best method to perform operations on word lists
- From: Bart Van der Donck
- best method to perform operations on word lists
- Prev by Date: [Announce] New custom version of SOAP::WSDL released
- Next by Date: identifying the duplicate when removing duplicates from any array
- Previous by thread: Re: best method to perform operations on word lists
- Next by thread: Re: best method to perform operations on word lists
- Index(es):