Re: Unexpected regex Behavior



John W. Krahn wrote:
Mark Shelor wrote:

Is it true that defining $/ to an integer reference (to read
fixed-length records) affects the meaning of the end-of-string symbol
($) in regex's?


No, it is not true.


For example, let's say I'm reading 4096-byte chunks from a file, and
wish to do special processing if any chunk ends with the carriage-return
character (\015). So, I start with code that looks like:

local $/ = \4096;
while (defined (my $rec = <F>)) {
while ($rec =~ /\015$/) {
# do special processing ...
}
...
}

Oddly, this doesn't seem to work. It ends up matching chunks that
contain, but don't necessarily end with, \015.

Instead, I have to do this:

local $/ = \4096;
while (defined (my $rec = <F>)) {
while (substr($rec, -1) eq "\015") {
# do special processing ...
}
...
}

Any idea what's going on?


perldoc perlre
[snip]
By default, the "^" character is guaranteed to match only the beginning
of the string, the "$" character only the end (or before the newline at
the end), and Perl does certain optimizations with the assumption that
the string contains only one line. Embedded newlines will not be
matched by "^" or "$". You may, however, wish to treat a string as a
multi-line buffer, such that the "^" will match after any newline
within the string, and "$" will match before any newline. At the cost
of a little more overhead, you can do this by using the /m modifier on
the pattern match operator. (Older programs did this by setting $*,
but this practice is now deprecated.)


So the regular expression will match with either "\015" or "\015\012" at the
end of the string. If you want it to only match at the end of the string use
/\015\z/ or the substr() expression.


Now it all makes perfect sense. Thanks for citing the reference, and thanks to you and MSG for the helpful replies.

As a side remark to MSG's response, both $ and \Z match *before* newline at the end, so only /\015\z/ will work in this case.

Regards, Mark
.



Relevant Pages

  • Re: Having problems with RS and NR
    ... NewLine ... // example of; in a string ... match the Newline character it only reports the number of ... Also BINARYMODE is working backwards, ...
    (comp.lang.awk)
  • Re: To "TAB" or not to "TAB"
    ... > negative I want etc. in character strings now in every compiler I'm ... If there is a graphic character in the string, ... that you are saying that the compiler should do no special processing, ...
    (comp.lang.fortran)
  • Re: easy string formating question
    ... then write that string to a file. ... It may be a bit brute-force-ish, and there may be other more elegant ways that I don't know, but that list comprehension extracts pieces of "s" of size "size" and creates a list where each piece doesn't excede "size" characters. ... The jointhen just smashes them all together, joined with your requested "quotation-mark followed by newline" ...
    (comp.lang.python)
  • Re: Unexpected regex Behavior
    ... wish to do special processing if any chunk ends with the carriage-return ... the "^" character is guaranteed to match only the beginning ... the "$" character only the end (or before the newline at ... the string contains only one line. ...
    (comp.lang.perl.misc)
  • Re: char * pass-by-value
    ... > void strip ... > character to obtain its value. ... The function assumes that 'in' points to a string containing at least ... there was no newline there. ...
    (comp.lang.c)