Re: Regex
- From: noreply@xxxxxxxxx (Gunnar Hjalmarsson)
- Date: Tue, 13 Nov 2007 23:53:12 +0100
Chas. Owens wrote:
On Nov 13, 2007 12:18 PM, Gunnar Hjalmarsson <noreply@xxxxxxxxx> wrote:Chas. Owens wrote:snipOn Nov 13, 2007 10:58 AM, Chas. Owens <chas.owens@xxxxxxxxx> wrote:Are you saying that \d is no longer equivalent to [0-9]? If so, which
snip
I believe you want /\d[a-z]{2}/i.snip
Oops, I didn't pay attention to my own warning about \d, I should have said
/[0-9][a-z]{2}/i
digits does \d match besides [0-9]?
Yep, that is what I am saying. The \d character class matches any
numeric character, and that includes all of the numeric characters in
Unicode. The following program outputs
Wide character in print at t.pl line 8.
Mongolian digit three is ᠓
it is a number (using \d)
it is not a number (using [0-9])
it is not a number (using looks_like_number)
I assume the reasoning for this is that regexes are text based, not
datatype based. That is the string "\x{1811}\x{1812}\x{1813}" is a
number ("123" in Mongolian) in text even if it isn't one Perl can do
math with. <sarcasm>Frankly, I blame all the foreigners, they should
just learn English and use ASCII</sarcasm>
<code snipped>
Thanks, Chas., for the explanation. I totally forgot about the Mongolian digits. ;-) Sometimes this encoding thing feels like rocket science...
Isn't this changed behaviour of Perl a compatibility issue? Would it be possible to tell a Perl program to use something else but utf8? If I understand it correctly, that's what you can do with \w by saying "use locale".
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
.
- References:
- Prev by Date: Re: Meta: please don't mail here if you have huge useless pointless disclaimers (was Re: SOH char)
- Next by Date: Re: Regex to remove zeros after . in file name
- Previous by thread: Re: Regex
- Next by thread: Re: Regex
- Index(es):
Relevant Pages
|