Re: Regex: Why is overreaching necessary?
- From: Uri Guttman <uri@xxxxxxxxxxxxxxx>
- Date: Sat, 10 Feb 2007 22:20:50 -0500
"SJ" == Shannon Jacobs <Shannon.Jacobs.nospam@xxxxxxxxx> writes:
SJ> Dealing with an array of fixed length strings. Goal is to select based
SJ> on certain columns. After rather lengthy study of the camel book and
SJ> searching on the web for various examples, I thought this should work:
SJ> X @foo2 = grep(/^.{50}(1121|1217|1256|2033).{6}$/,@foo1);
SJ> It did not. I consulted with a heavy Perler, and after a few minutes
SJ> of wrestling with the problem, he suggested something like this (as I
SJ> tinkered it into working):
SJ> @foo2 = grep(/^.{50,62}($1121|1217|1256|2033).{6,18}$/,@foo1);
you should show some sample data as well so we can see what you are
matching. as jurgen said that is painful to read. even good perl hackers
will have trouble deciphering it quickly and that means it is not good
perl IMO.
also this line has $1121 and the previous one didn't have the $ so i am
not sure which is correct.
SJ> My idea in the broken example was to ignore the first 50 and last 6
SJ> characters in each line, which was supposed to leave only the 12
SJ> characters in the middle to search against. My fuzzy understanding of
SJ> the working version is that I first had to match the entire thing, and
SJ> then let Perl fish for candidate matches by truncating down towards
SJ> 50?
no need to ignore the last 6 chars as that won't affect the match unless
some lines were of different lengths.
SJ> The examples above are slightly simplified for purposes of
SJ> explanation. Here is the actual code, just in case I did something
SJ> wrong in the tweaking:
SJ> @foo2 = grep(/^.{50,62}($form_values{'a_SEARCH_VALUE'}).
SJ> {6,18}$/,@foo1);
that doesn't seem to be a fixed offset value. the initial skip is from
50-62 chars. if the search value can't appear in that, why not just
grep for that? is the search value something with alternation as the
above lines suggest? then a faster thing might be to grab the part you
want and look it up in a hash of wanted values. alternation can be very
slow especially with many choices (due to backtracking).
in fact as you have been told, substr and a hash lookup might be the
perfect thing for this (but i am not sure since the leading skip can
vary in size). again, showing some real data would help as we could see
what variants there are, what the searched for parts look like (and if
they are not found earlier in the string), etc.
uri
--
Uri Guttman ------ uri@xxxxxxxxxxxxxxx -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
.
- References:
- Regex: Why is overreaching necessary?
- From: Shannon Jacobs
- Regex: Why is overreaching necessary?
- Prev by Date: Re: Regex: Why is overreaching necessary?
- Next by Date: Re: Regex: Why is overreaching necessary?
- Previous by thread: Re: Regex: Why is overreaching necessary?
- Next by thread: Re: Regex: Why is overreaching necessary?
- Index(es):
Relevant Pages
|