Re: strange effect with [:lower:] in perl

From: Abigail (abigail_at_abigail.nl)
Date: 10/28/03


Date: 28 Oct 2003 19:48:44 GMT

Alan J. Flavell (flavell@ph.gla.ac.uk) wrote on MMMDCCX September
MCMXCIII in <URL:news:Pine.LNX.4.53.0310281819040.28979@ppepc56.ph.gla.ac.uk>:
][ On Mon, 27 Oct 2003, Abigail wrote:
][
][ > "" > [a-z0-9] # Lowercase letters *and* digits.
][ > ""
][ > "" Surely that only refers to a subset of what Unicode considers to be
][ > "" "letters"?
][ >
][ > Yeah, but that's what [:lower:] seems to do too:
][ >
][ > $ perl -wle 'for (0x00 .. 0x80) {
][
][ Surely you meant to set the limit at 0xff or so for this
][ demonstration?

Yes, I did. However, it doesn't change the outcome.

][ > printf "%02x %s\n", $_, chr if chr () =~ /[[:lower:]]/}'
][
][ [snip]
][
][ > No lowercase accented letters here.
][
][ Curious. No surprise when the limit's set at 0x80, as I'm sure you'd
][ agree; but I must admit I was surprised at the accented lower-case
][ letters up to 0xff not being counted, despite the accented lower case
][ letters above 0x100 being counted. Prima facie I think there's
][ something wrong here, no? (This is perl 5.8.0 per RedHat 9).

Maybe, maybe not. I'm still confused what Perl is doing with Unicode,
and considering all the discussions on p5p, not everyone wants to do
the same.

And considering that the fonts I use are unable to display Unicode,
I'm not that interested anyway.

Abigail

-- 
$_ = "\x3C\x3C\x45\x4F\x54\n" and s/<<EOT/<<EOT/ee and print;
"Just another Perl Hacker"
EOT


Relevant Pages

  • Re: Conjucation of "to be"
    ... (edict). ... and which also contained the letters "be". ... >table for that verb but to my big surprise i found that i ...
    (sci.lang.japan)
  • Re: Ugaritic Affiliations
    ... hacek), namely Unicode U+1038C UGARITIC LETTER SHIN, and a form 'without ... There are 30, not 31, letters in the Ugaritic script, and they ... The non-Roman transliteration I've seen offered for zu include Hebrew teth ... the non-Roman transliterations I've seen offered for ssu include Hebrew ...
    (sci.lang)
  • Re: Article: "...Basic Power Orchestral Repertoire..." (What, No
    ... following letters to be typed in ... so this doesn't surprise me at all. ... There was, I am sorry to say, a 45 RPM single version of the "ASZ" ... development, development, all this thing can do is repeat, repeat, repeat. ...
    (rec.music.classical.recordings)
  • Re: Questions about MSDN for some DDK functions
    ... >From MSDN, WM_CHAR documentation: ... The WM_CHAR message uses Unicode Transformation Format -16. ... In Unicode one 16-bit lowercase letter properly uppercases to ... >> 16-bit uppercase letters. ...
    (microsoft.public.development.device.drivers)
  • Re: strange effect with [:lower:] in perl
    ... No surprise when the limit's set at 0x80, ... but I must admit I was surprised at the accented lower-case ... letters above 0x100 being counted. ...
    (comp.lang.perl.misc)