Re: Regex



Chas. Owens wrote:
On Nov 13, 2007 12:18 PM, Gunnar Hjalmarsson <noreply@xxxxxxxxx> wrote:
Chas. Owens wrote:
On Nov 13, 2007 10:58 AM, Chas. Owens <chas.owens@xxxxxxxxx> wrote:
snip
I believe you want /\d[a-z]{2}/i.
snip

Oops, I didn't pay attention to my own warning about \d, I should have said

/[0-9][a-z]{2}/i
Are you saying that \d is no longer equivalent to [0-9]? If so, which
digits does \d match besides [0-9]?
snip

Yep, that is what I am saying. The \d character class matches any
numeric character, and that includes all of the numeric characters in
Unicode. The following program outputs

Wide character in print at t.pl line 8.
Mongolian digit three is ᠓
it is a number (using \d)
it is not a number (using [0-9])
it is not a number (using looks_like_number)

I assume the reasoning for this is that regexes are text based, not
datatype based. That is the string "\x{1811}\x{1812}\x{1813}" is a
number ("123" in Mongolian) in text even if it isn't one Perl can do
math with. <sarcasm>Frankly, I blame all the foreigners, they should
just learn English and use ASCII</sarcasm>

<code snipped>

Thanks, Chas., for the explanation. I totally forgot about the Mongolian digits. ;-) Sometimes this encoding thing feels like rocket science...

Isn't this changed behaviour of Perl a compatibility issue? Would it be possible to tell a Perl program to use something else but utf8? If I understand it correctly, that's what you can do with \w by saying "use locale".

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
.



Relevant Pages

  • Re: working with very LARGE NUMERIC values in VBS
    ... logic for the script below is that a number with more digits than ... regular string comparison will give the same result as ... Please reply to the newsgroup. ...
    (microsoft.public.scripting.vbscript)
  • Re: usage of Split on pattern matching
    ... The split function splits a string into fields based on a delimiter. ... one or more digits followed by one whitespace character. ...
    (perl.beginners)
  • Re: Any progress yet? (was Re: Fast pi program?)
    ... mike3 wrote: ... snip ... ... you need to compute 2x the digits then convert to 26 (since the ... Chuck F ...
    (comp.programming)
  • Re: not a homework question
    ... sheets will it needs? ... But as far as digits are concerned, I have seen an estimate that there ... are about 10^87 elementary particles in the observable universe. ...
    (comp.lang.c)
  • Re: algorithm for finding Pi in C
    ... That doesn't mean it's adequate for *all* uses. ... Well I also learned a sentence for PI to 20 something digits, ... All my own striving can't relate. ...
    (comp.lang.c)