Re: Searching substrings in records.
- From: jt@xxxxxxxxxxx (Jens Thoms Toerring)
- Date: 27 Jun 2008 10:33:17 GMT
janesconference@xxxxxxxxx wrote:
Suppose I have a lot of records, each composed of a fixed number of
fields, like a database. Some of these fields are strings (or char
arrays, what you prefer).
I want to extract every record in my record collection whose string
field contain a given substring. Substrings can be matched only within
words, i.e. "foo" matches "you are a fool" but not "babyproof oo
programming languages" (I've come with the two most idiot examples a
human mind could conceive, I know).
The brute force solution would be iterate on every string field of
every record and perform a substring search on any of them.
If at least one field contains the substring, the record must be
returned.
The question: is there a smarter approach than this? Maybe indexing
somehow the records before performing the searches? And are there
decent (c/c++) libraries dealing with this specific problem?
If you want to know if a substring is in one of the strings of
your records you have to go through them once - how would you
know otherwise? Now, if you would search for the same substring
again and again and the records don't change then, of course, it
might make sense to store the result somewhere and use the al-
ready computed result.
I have no clear idea what you mean by "indexing the records be-
fore performing the search". Unless you have some prior infor-
mations about the substrings that will be searched for all that
you could reasonably precompute is the lengths of the strings
in the records and don't search those strings that are shorter
than the string that gets searched for and similar, very general
informatione (like does the strings contain upper or lower case
characters and, if you know the language, if characters with a
very high or low chance to appear in a text are in the string).
But the question you should askyourself is if searching for sub-
strings is theperformance bottle neck in your application before
investing much time to optimize this aspect of it. Did you con-
sider sticking the records into a database and let the database
do the dirty work? That's probably the nearest match you will
get when looking for a library dealing with such problems.
Regards, Jens
--
\ Jens Thoms Toerring ___ jt@xxxxxxxxxxx
\__________________________ http://toerring.de
.
- Follow-Ups:
- Re: Searching substrings in records.
- From: janesconference
- Re: Searching substrings in records.
- From: janesconference
- Re: Searching substrings in records.
- References:
- Searching substrings in records.
- From: janesconference
- Searching substrings in records.
- Prev by Date: Re: recursion with C
- Next by Date: Re: Searching substrings in records.
- Previous by thread: Searching substrings in records.
- Next by thread: Re: Searching substrings in records.
- Index(es):
Relevant Pages
|
Loading