Re: Unexpected regex Behavior
- From: Mark Shelor <mshelor@xxxxxxxx>
- Date: Sun, 14 May 2006 16:16:39 -0700
John W. Krahn wrote:
Mark Shelor wrote:
Is it true that defining $/ to an integer reference (to read
fixed-length records) affects the meaning of the end-of-string symbol
($) in regex's?
No, it is not true.
For example, let's say I'm reading 4096-byte chunks from a file, and
wish to do special processing if any chunk ends with the carriage-return
character (\015). So, I start with code that looks like:
local $/ = \4096;
while (defined (my $rec = <F>)) {
while ($rec =~ /\015$/) {
# do special processing ...
}
...
}
Oddly, this doesn't seem to work. It ends up matching chunks that
contain, but don't necessarily end with, \015.
Instead, I have to do this:
local $/ = \4096;
while (defined (my $rec = <F>)) {
while (substr($rec, -1) eq "\015") {
# do special processing ...
}
...
}
Any idea what's going on?
perldoc perlre
[snip]
By default, the "^" character is guaranteed to match only the beginning
of the string, the "$" character only the end (or before the newline at
the end), and Perl does certain optimizations with the assumption that
the string contains only one line. Embedded newlines will not be
matched by "^" or "$". You may, however, wish to treat a string as a
multi-line buffer, such that the "^" will match after any newline
within the string, and "$" will match before any newline. At the cost
of a little more overhead, you can do this by using the /m modifier on
the pattern match operator. (Older programs did this by setting $*,
but this practice is now deprecated.)
So the regular expression will match with either "\015" or "\015\012" at the
end of the string. If you want it to only match at the end of the string use
/\015\z/ or the substr() expression.
Now it all makes perfect sense. Thanks for citing the reference, and thanks to you and MSG for the helpful replies.
As a side remark to MSG's response, both $ and \Z match *before* newline at the end, so only /\015\z/ will work in this case.
Regards, Mark
.
- References:
- Unexpected regex Behavior
- From: Mark Shelor
- Re: Unexpected regex Behavior
- From: John W. Krahn
- Unexpected regex Behavior
- Prev by Date: Re: Activestate/win32 novice help
- Next by Date: Re: "Wide character in syswrite" in writing an HTML form.
- Previous by thread: Re: Unexpected regex Behavior
- Next by thread: FAQ 4.13 How do I find the current century or millennium?
- Index(es):
Relevant Pages
|