Re: Perl opting for double-byte chars?
From: Alan J. Flavell (flavell_at_ph.gla.ac.uk)
Date: 09/12/04
- Next message: Anno Siegel: "Re: "RFC": re [un]pack()"
- Previous message: Bulent Murtezaoglu: "Re: Xah Lee's Unixism"
- In reply to: Shawn Corey: "Re: Perl opting for double-byte chars?"
- Next in thread: Bëelphazoar: "Re: Perl opting for double-byte chars?"
- Reply: Bëelphazoar: "Re: Perl opting for double-byte chars?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Sun, 12 Sep 2004 16:01:45 +0100
A: No!
On Sun, 12 Sep 2004, Shawn Corey blurted out atop a fullquote:
> I got caught on this one too.
Are you sure it was the same?
> See perldoc perluniintro and perldoc perlunicode.
Yup, good advice, already offered.
> Perl v5.8+ has a feature that automatically and silently converts
> its standard (pre-v5.8) strings into UTF-8 strings if it encounters
> a Unicode character.
If by "a Unicode character" you mean one whose code value is greater
than 255, then you're right; but we've been given no evidence here
that such a character has been involved. The only "interesting"
character under discussion has been one which fell into the range
occupied by printable characters in iso-8859-1, namely 160-255
decimal.
Perl 5.8 would only have "upgraded" that to utf8 if it had been
given cause to do so. In 5.8.0, one such cause is the presence
of utf-8 in the locale. See also the discussion in
http://use.perl.org/articles/03/09/26/2231256.shtml?tid=6 , or
http://twiki.org/cgi-bin/view/Codev/UsingPerl58OnRedHat8 , or
the various other articles that pop up when one tries the search that
I had suggested.
My hunch is that's what happened. Maybe I'll be proved wrong; we'll
see.
> I haven't figure a reliable way around this yet
(which suggests you haven't read the relevant perldocs closely enough)
There are various approaches, depending on what your problem field is
and what you're trying to achieve.
If you force the old behaviour, then you can get what you'd have been
accustomed to before, and you won't suffer the overhead of Perl
processing Unicode; but you'll cut yourself off from the ability to
process a fuller range of characters, writing systems etc.
If you learn how to work with Unicode - and your database /also/ knows
how to work with it - then you can write software that can handle
writing systems which are way outside of mere Latin 1; but you may
incur some processing overhead due to the extra work of Perl handling
Unicode characters.
With care, code can be written such that the overhead only cuts in
when charcters outside of the iso-8859-1 repertoire are used. Thus
getting the best of both worlds - without having to write messy
dual-path code, because Perl takes care of it for you (if you're
asking it right).
In general I'd say (except perhaps for diagnostic purposes), if you're
messing around with packing and unpacking characters, then you're
doing it wrong. The key is to grasp Perl's character representation
model, and to work *with* it, not to fight it with hand-packed and
-unpacked representations.
This assumes that your code only needs to run on >= 5.8.0. If you're
writing code meant to be runnable on older Perls, then you have to put
quite a lot more care into the task of producing something compatible.
ttfn
Q: Should I put my Usenet response on the top of a quote of the entire
previous posting?
http://www.faqs.org/docs/jargon/T/top-post.html
- Next message: Anno Siegel: "Re: "RFC": re [un]pack()"
- Previous message: Bulent Murtezaoglu: "Re: Xah Lee's Unixism"
- In reply to: Shawn Corey: "Re: Perl opting for double-byte chars?"
- Next in thread: Bëelphazoar: "Re: Perl opting for double-byte chars?"
- Reply: Bëelphazoar: "Re: Perl opting for double-byte chars?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|