Re: Converting a string to multiple search patterns
From: Anno Siegel (anno4000_at_lublin.zrz.tu-berlin.de)
Date: 06/08/04
- Next message: Richard Bell: "Re: Using LWP to get last modified date of web page"
- Previous message: Bernard El-Hagin: "Re: Newbie"
- In reply to: Tore Aursand: "Converting a string to multiple search patterns"
- Next in thread: Tore Aursand: "Re: Converting a string to multiple search patterns"
- Reply: Tore Aursand: "Re: Converting a string to multiple search patterns"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 8 Jun 2004 11:53:53 GMT
Tore Aursand <tore@aursand.no> wrote in comp.lang.perl.misc:
> Hi all!
>
> I'm stumped on this one: I have an application where I need to refine the
> search mechanism. The concept is quite simple: Get a string, convert it
> to separate words, count (and "score") each word for each document, and
> then display the result based on the score;
>
> my $query = 'A B C D';
> my @words = split( /\s+/, $query );
> foreach ( @documents ) {
> # ...
> }
>
> I need to refine it, as said. I want a higher score for word sequences,
> and in a particular order. For the example above ('A B C D'), I want to
> match in this order:
>
> 1. A B C D
> 2. A B C
> 3. B C D
> 4. A B
> 5. C D
> 6. A C
> 7. B D
> 9. A D
> 9. A
> 10. B
> 11. C
> 12. D
I'm missing "A B D", "A C D", and " B C " from the collection.
Are these entirely arbitrary?
> Anyone know of a module which can accomplis this? I really haven't tried
> with anything yet, 'cause I have no clue on how to do it. The closest
> thing I've been, has been with the Algorithm::Permute module. It doesn't
> give me what I want "out of the box", though...
I'm not sure what you are asking. Is it the generation of all selections
of 1 .. 4 objects from a set of 4? These don't correspond to permutations,
but to four-digit binary numbers (so there are 2**4 - 1 = 15 of them,
not counting the empty selection). I'm sure there is a module on CPAN
to generate them, but ad-hoc solutions aren't too hard either.
Or is the issue how to assign a score to each of a collection of
regexes and retrieve the score after each match? This can be done
using the (?{}) construct to execute code at match time.
Starting from your list (@lines, say) above, I'd generate a list @score
of pairs where each pair holds a score and a string to match:
my @score = map [ split /\./], @lines;
$_->[ 1] =~ tr/ //d for @score;
The second line simplifies things by deleting all blanks from the strings
to match. Your practical regexes may look different.
Build an alternation of patterns where each pattern includes code
to set a variable ($scored) to the corresponding score:
my $rex = join '|', map "$_->[ 1](?\{ \$scored = $_->[ 0] \})", @score;
Generate a test string and check it.
my $text = join '', map qw( A B C D E)[ rand 5], 1 .. 100;
my $scored;
use re 'eval';
while ( $text =~ /($rex)/g ) {
print "score $scored: $1\n";
}
Anno
- Next message: Richard Bell: "Re: Using LWP to get last modified date of web page"
- Previous message: Bernard El-Hagin: "Re: Newbie"
- In reply to: Tore Aursand: "Converting a string to multiple search patterns"
- Next in thread: Tore Aursand: "Re: Converting a string to multiple search patterns"
- Reply: Tore Aursand: "Re: Converting a string to multiple search patterns"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|