Re: Searching substrings in records.




The scan is implemented as a simple FSM (finite state machine). But
the idea is to read the data a few times as possible. You might even
specialize it even more by scanning for the pattern in the routine
that fetches the raw records, skipping the merge step. so then the
algorithm looks like
fetch next record
    for each text field in the record
       scan text for foo (or whatever your search pattern is)
       if match, log success on the record ID

Excuse me, but I just can't tell the difference between the algorithm
you introduced above and the brute force approach I described on the
first post:

"The brute force solution would be iterate on every string field of
every record and perform a substring search on any of them.
If at least one field contains the substring, the record must be
returned. "

This could be programmed fairly directly in a PERL script. As is often
the case, the right tool/language for the job can make things much
easier.

Or in Python, but language doesn't matter, now I'm focused on the ways
to do it.

This all assumes the search is fairly infrequent, like an ad hoc
search. If it is to be done on a regular basis, then building indices
might be of benefit. (consider for example the Oracle CONTEXT
package).

Yeah, it' an ad hoc search. I've seen commercial programs (like
Mixmeister on its library of mp3 tags, for example) doing this kind of
things so quickly that it implements a search-while-you-type dialog.
.



Relevant Pages

  • Re: Project help
    ... Low Power AES algorithm using VHDL ... in GALIOS FIELD and other parts of the algorithm like galios field ...   port( ...
    (comp.lang.vhdl)
  • Re: Lockfree flqueue and lockfree_mpmc ...
    ... and they all have the same problem under contention, ... MOV ECX, EAX ... there is a problem with this algorithm on the push and pop side... ...   then ...
    (comp.programming.threads)
  • Re: Project help
    ... Low Power AES algorithm using VHDL ... in GALIOS FIELD and other parts of the algorithm like galios field ...   port( ...
    (comp.lang.vhdl)
  • Re: ratio approximation algorithm
    ... I'm looking for an algorithm other than brute force that ... >> will speed up the searching process. ... I'd be surprised if Monte Carlo failed to suit your needs. ... a pure brute force approach took 0.37 seconds on my laptop (1.8GHz ...
    (comp.programming)
  • Re: Bonehead basic crypto question
    ... Even if 256-bit is broken by brute force using quantum computers ... as is secure should be used. ... People might like to say "even if an algorithm is ... be conservative) and focus on eliminating shortcut attacks. ...
    (sci.crypt)