Re: Searching substrings in records.
- From: janesconference@xxxxxxxxx
- Date: Fri, 27 Jun 2008 07:10:08 -0700 (PDT)
The scan is implemented as a simple FSM (finite state machine). But
the idea is to read the data a few times as possible. You might even
specialize it even more by scanning for the pattern in the routine
that fetches the raw records, skipping the merge step. so then the
algorithm looks like
fetch next record
for each text field in the record
scan text for foo (or whatever your search pattern is)
if match, log success on the record ID
Excuse me, but I just can't tell the difference between the algorithm
you introduced above and the brute force approach I described on the
first post:
"The brute force solution would be iterate on every string field of
every record and perform a substring search on any of them.
If at least one field contains the substring, the record must be
returned. "
This could be programmed fairly directly in a PERL script. As is often
the case, the right tool/language for the job can make things much
easier.
Or in Python, but language doesn't matter, now I'm focused on the ways
to do it.
This all assumes the search is fairly infrequent, like an ad hoc
search. If it is to be done on a regular basis, then building indices
might be of benefit. (consider for example the Oracle CONTEXT
package).
Yeah, it' an ad hoc search. I've seen commercial programs (like
Mixmeister on its library of mp3 tags, for example) doing this kind of
things so quickly that it implements a search-while-you-type dialog.
.
- Follow-Ups:
- Re: Searching substrings in records.
- From: Ed Prochak
- Re: Searching substrings in records.
- References:
- Searching substrings in records.
- From: janesconference
- Re: Searching substrings in records.
- From: Jens Thoms Toerring
- Re: Searching substrings in records.
- From: janesconference
- Re: Searching substrings in records.
- From: Ed Prochak
- Searching substrings in records.
- Prev by Date: Re: A question about dynamic memory allocation
- Next by Date: Re: Searching substrings in records.
- Previous by thread: Re: Searching substrings in records.
- Next by thread: Re: Searching substrings in records.
- Index(es):
Relevant Pages
|