ANNOUNCE: Search::InvertedIndex::Simple V 1.00
From: Ron Savage (ron_at_savage.net.au)
Date: 02/23/05
- Next message: Brian: "Having a problem with a short pipe read"
- Previous message: junnuthala: "Compare huge XML Files"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Wed, 23 Feb 2005 10:16:58 GMT
The pure Perl module Search::InvertedIndex::Simple V 1.00
is available immediately from CPAN,
and from http://savage.net.au/Perl-modules.html.
On-line docs, and a *.ppd for ActivePerl are also
available from the latter site.
An extract from the docs:
The input to new(dataset => $a, keyset => $k) is an arrayref of data (each element of which is a hashref),and an arrayref of keys.
The arrayref of data is in the format returned by many DBI methods,eg DBI's fetchall_arrayref({}) and DBIx::SQLEngine's fetch_select().
The arrayref of keys is used to select a subset of the keys within each hashref.These selected keys become the primary keys in the hashref returned by the method build_index().
In the example in the synopsis, build_index() will return a hashref with the primary keys 'address' and 'time'.
The values (assumed to be strings) from the arrayref of data corresponding to those keys are used to create a set of secondary keys under each of these primary keys.
The secondary keys are created by taking these values, growing them one character at a time, and using these generated strings as the secondary keys in the hashref returned by the method build_index().
In the example in the synopsis, build_index() will return a hashref where the primary key 'address' will have these secondary keys: H, He, Hea, Heav, Heave, Heaven, Her, Here, T, Th, The, Ther, There.
This means that all data values for the key 'address', and all prefixes of those values, are used to create entries in the returned hashref.
Similary, the primary key 'time' will have a set of secondary keys.
It should be clear by now that these sets of secondary keys can be used for searching for the existence of values, eg by using as input user-supplied data of any length. At the same time, any number of keys can be searched for simultaneously.
Consider:
my($indexer) = Search::InvertedIndex::Simple -> new(...);
my($index) = $indexer -> build_index();
Now we can tell instantaneously which elements of the dataset contain the results of a multi-key search:
my(@index) = $$index{'address'}{'He'} -> intersection($$index{'time'}{'T'}) );
That is, @index = (1). In other words, $$d[1] contains the only hashref where we have an address value starting with 'He' and a time value starting with 'T'.
Here, intersection() is a method available to objects of type Set::Array, and it returns a list.
-- Cheers Ron Savage, ron@savage.net.au on 23/02/2005 http://savage.net.au/index.html Let the record show: Microsoft is not an Australian company
- Next message: Brian: "Having a problem with a short pipe read"
- Previous message: junnuthala: "Compare huge XML Files"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|
|