Re: Regex: Why is overreaching necessary?



"SJ" == Shannon Jacobs <Shannon.Jacobs.nospam@xxxxxxxxx> writes:

SJ> Dealing with an array of fixed length strings. Goal is to select based
SJ> on certain columns. After rather lengthy study of the camel book and
SJ> searching on the web for various examples, I thought this should work:

SJ> X @foo2 = grep(/^.{50}(1121|1217|1256|2033).{6}$/,@foo1);

SJ> It did not. I consulted with a heavy Perler, and after a few minutes
SJ> of wrestling with the problem, he suggested something like this (as I
SJ> tinkered it into working):

SJ> @foo2 = grep(/^.{50,62}($1121|1217|1256|2033).{6,18}$/,@foo1);

you should show some sample data as well so we can see what you are
matching. as jurgen said that is painful to read. even good perl hackers
will have trouble deciphering it quickly and that means it is not good
perl IMO.

also this line has $1121 and the previous one didn't have the $ so i am
not sure which is correct.


SJ> My idea in the broken example was to ignore the first 50 and last 6
SJ> characters in each line, which was supposed to leave only the 12
SJ> characters in the middle to search against. My fuzzy understanding of
SJ> the working version is that I first had to match the entire thing, and
SJ> then let Perl fish for candidate matches by truncating down towards
SJ> 50?

no need to ignore the last 6 chars as that won't affect the match unless
some lines were of different lengths.


SJ> The examples above are slightly simplified for purposes of
SJ> explanation. Here is the actual code, just in case I did something
SJ> wrong in the tweaking:

SJ> @foo2 = grep(/^.{50,62}($form_values{'a_SEARCH_VALUE'}).
SJ> {6,18}$/,@foo1);

that doesn't seem to be a fixed offset value. the initial skip is from
50-62 chars. if the search value can't appear in that, why not just
grep for that? is the search value something with alternation as the
above lines suggest? then a faster thing might be to grab the part you
want and look it up in a hash of wanted values. alternation can be very
slow especially with many choices (due to backtracking).

in fact as you have been told, substr and a hash lookup might be the
perfect thing for this (but i am not sure since the leading skip can
vary in size). again, showing some real data would help as we could see
what variants there are, what the searched for parts look like (and if
they are not found earlier in the string), etc.

uri

--
Uri Guttman ------ uri@xxxxxxxxxxxxxxx -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
.



Relevant Pages

  • Re: can I replace a string in all Matlab .m files under a folder?
    ... How to write a program in Matlab to change the strings ... maybe the Perl in Matlab can help? ... the command that follows is to be executed. ...
    (comp.soft-sys.matlab)
  • Re: can I replace a string in all Matlab .m files under a folder?
    ... How to write a program in Matlab to change the strings ... maybe the Perl in Matlab can help? ... the command that follows is to be executed. ...
    (comp.soft-sys.matlab)
  • Re: Math
    ... converting them to strings, then back, the precision will be lost. ... following instructions from perl. ... involve steps performed by the C compiler. ...
    (comp.lang.perl.misc)
  • Re: [Regex] Suchen nach Hex-Zeichen
    ... daher muss Perl wissen, in welchem Zeichensatz das Script abgespeichert ... allen Strings klarkommt, in denen nur Zeichen bis 0xFF vorkommen, solange ... man Strings als Zeichenfolgen betrachtet und nicht die Plattform wechselt. ... besteht der aus einem Zeichen mit dem Code 0x84. ...
    (de.comp.lang.perl.misc)
  • Re: sed: How to avoid making changes within a literal string?
    ... Alternation in the regex patterns is often a main factor to lower the ... That doesn't apply to perl alternations that are much simpler ... and then goes backward character by character to ...
    (comp.unix.shell)