(Unsure where to post)
From: atomx (prothe113_at_yahoo.com)
Date: 12/09/04
- Next message: Neil W Rickert: "Re: Platonism"
- Previous message: Lester Zick: "Re: Platonism"
- Next in thread: Colin Percival: "Re: (Unsure where to post)"
- Reply: Colin Percival: "Re: (Unsure where to post)"
- Reply: Nicholas King: "Re: (Unsure where to post)"
- Reply: Kent Paul Dolan: "Re: (Unsure where to post)"
- Reply: stephen_at_nomail.com: "Re: (Unsure where to post)"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 9 Dec 2004 12:16:59 -0800
Hi all, I hope this is not a completely off-topic post.
I am a programmer -- more of a mechanic than a scientist, since I don't
have the CS background that I'm sure you all have.
With that in mind, here's my question:
What is an efficient algorithm for searching a large string (a block of
memory that's 10-20mb in size) for a varying number of strings, each of
which is of varying length and may contain zero or more "don't-care"
characters at any position? The alphabet is large (256 -- searching
for bytes) and the contents of the pattern and the search text are both
fairly evenly distributed across the alphabet. The "don't-care"
characters are fixed-length -- each one will match one and only one
symbol. Patterns are long-ish (between 30 and 150 bytes each).
I've got this implemented as a naive search right now and it works okay
but that's only because my list of "fingerprints" (the memory patterns
I'm searching for) is small (five right now).
I looked at KMP but my lack of CS background immediately became a
problem; I modified KMP to work for "don't-care" symbols but what I
ended up with was a hybrid KMP that ended up matching any number of
characters against a "don't-care". That's... interesting, but not what
I want.
I think the problem is complicated by the fact that I'm searching for
multiple patterns at once; will I be better off finding one good
algorithm and running it multiple times, or finding an algorithm that
checks all the patterns at once? I know that some of the better
skipsearch algorithms (like Boyer-Moore) won't work properly for
multiple patterns at once. I was thinking about Boyer-Moore and
thinking that since it's (bestcase) O(n/m), if my pattern count will be
(coincidentally) about the same as my pattern length, I end up close to
O(n) bestcase performance. That would be just dandy, but I'll be happy
with anything significantly better than the naive search I'm doing.
Thanks in advance for any ideas, I've been banging my unschooled head
against the wall for long enough; I figured it was time to ask the
pro's.
- Next message: Neil W Rickert: "Re: Platonism"
- Previous message: Lester Zick: "Re: Platonism"
- Next in thread: Colin Percival: "Re: (Unsure where to post)"
- Reply: Colin Percival: "Re: (Unsure where to post)"
- Reply: Nicholas King: "Re: (Unsure where to post)"
- Reply: Kent Paul Dolan: "Re: (Unsure where to post)"
- Reply: stephen_at_nomail.com: "Re: (Unsure where to post)"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|