Re: best method to perform operations on word lists



I have come up with something which seems to work (partially). For my
current purpose it'll do the trick but any suggestion for optimization
is welcome:

(...)
chomp(my $prev = <TERMS>);

my @reducedlist = $prev;

while ( <TERMS> ) {
chomp;
push @reducedlist, $_ unless (/^$prev/ && length ($_)- length($prev)<3)
; # I can set the maximum lenth of a suffix here
$prev = $_;
}
print "$_\n" for @reducedlist;

[Amazing to see how much time people can invest in a few lines of code
when they are no professionals ;-)! ]

Francois

Bart Van der Donck schrieb:

Francois Massion wrote:

[...]
#!perl
use strict; use warnings;
my $list =
"überzeugt
überzeugt,
überzogen
überzogen,
überzogen.
üblich
übliche
üblichen
üblicherweise";
my @terms = split /\n/, $list;
my $prev = 'nonesuch584685542256RANOM58544';

This didn't modify the list.

I didn't mean to modify $list; the new content is in @terms. If you
want $list to contain the new words, you can use something like this at
the end of the program.

$list = join "\n", @terms;

Maybe the reason is the $prev definition.

$prev has no direct importance here, it's only required that it should
not be present in @terms, because it is used to delete double entries
from @terms.

s/(\.|,|e|en|e,|en,|e\.|en\.)$// for @terms;
I also tried Dr. Ruud's regex but it would have to be rewritten for
each language.

That is correct, hence my thoughts about language files. My code is a
very brute algorithm - it only strips out the following from the end of
each line:

. , e en e en, e. en.

If you are planning to use this for different languages, you would
obviously need to modify those patterns each time.

--
Bart

.



Relevant Pages

  • Re: best method to perform operations on word lists
    ... use strict; use warnings; ... This didn't modify the list. ... $prev has no direct importance here, it's only required that it should ... hence my thoughts about language files. ...
    (comp.lang.perl.misc)
  • Re: document.all vs. document.getElementByID
    ... >> But, if I modify it to use document.getElementByID, it generates a ... >> syntax error. ... JavaScript is a Case Sensitive language... ... Prev by Date: ...
    (comp.lang.javascript)
  • Re: Word 2004 inserting spaces after quotes/apostrophes in Arial
    ... With the cursor in the problematic text, go to Tools | Language. ... Modify the style of the text. ... problematic text, go to Format | Style, the style in use will come up ... > new Roman PS or Arial Narrow. ...
    (microsoft.public.mac.office.word)
  • Re: Stop autocorrect for specific style only
    ... Modify the style to include the "Do not check spelling or grammar" property ... If you also set the style to some language other than the one ... assigned to your .acl (AutoCorrect) file, presumably AutoCorrect won't work, ... Word MVP FAQ site: http://word.mvps.org ...
    (microsoft.public.word.pagelayout)
  • Re: bibtex language
    ... in english it is displayed "Nationality ... >I already changed it to display Brevet instead of Patent by modifying ... >As I don't really understand this language, ... >What should I modify? ...
    (comp.text.tex)