Re: utf-8, was Re: Three questions: UTF-8, DBM, hash of lists, ...
From: Alan J. Flavell (flavell_at_ph.gla.ac.uk)
Date: 01/15/05
- Next message: Abigail: "Re: convention regarding lexical filehandles"
- Previous message: Martin Kissner: "Re: Adding a delimiter inbetween number characters and letter characters"
- In reply to: Wes Groleau: "Re: utf-8, was Re: Three questions: UTF-8, DBM, hash of lists, ..."
- Next in thread: Wes Groleau: "perl 5.8 bug ? (was Re: utf-8, was Re: Three questions: ....)"
- Reply: Wes Groleau: "perl 5.8 bug ? (was Re: utf-8, was Re: Three questions: ....)"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Sat, 15 Jan 2005 22:00:10 +0000
On Sat, 15 Jan 2005, Wes Groleau wrote:
> Welcome to Usenet.
Indeed. It seems from your response, and the rarity of responses from
other contributors, that you're in the position to offer us all a
valuable tutorial on the topic.
> I don't want to know what it does internally, as long as everything
> comes out UTF-8 and is decoded as such going in.
Fine, then we're pretty much up to speed already, and I'm sorry that I
misinterpreted your original posting.
> > Which is not to deny that there can also be situations where you'd
> > want to write unicode characters directly - but then you have to
> > be a lot more careful with how you edit and transfer your source
> > code. See
> > http://www.perldoc.com/perl5.8.4/pod/perlunicode.html#Effects-of-Character-Semantics
> > for more details.
>
> Yes, I read that. I'm trying to minimize the need for "being
> careful" about all those ten zillion details by specifying
> "everything is UTF-8."
Point made. If you're really in control of all that data then you're
in a much happier position than I've ever been ;-)
> I 1 STDIN is assumed to be in UTF-8
> O 2 STDOUT will be in UTF-8
> E 4 STDERR will be in UTF-8
> S 7 I + O + E
> i 8 UTF-8 is the default PerlIO layer for input streams
> o 16 UTF-8 is the default PerlIO layer for output streams
> D 24 i + o
>
> Seems to say -CSDA should handle all my IO
It does, doesn't it? Did I miss the specific problem you were having,
and your test case that demonstrated it?
> > > But
> > > another man page seemed to say that "use utf8;" covered
> > > something that -CSD did not, so I put that in, too.
> >
> > The perlunicode pod, for the version of Perl that you're using,
> > should be your "bible". Don't go tossing-in arbitrary bits and
> > pieces that
>
> I have 5.8.1 but no pod, so my 'elsewhere' is the man pages
> derived from the pod.
No disagreement there. More than one way to...read the documentation.
> > See what
> > http://www.perldoc.com/perl5.8.4/pod/perlunicode.html#Important-Caveats
> > says about "use utf8;".
>
> It says the same as my man page: that the pragma is needed
> to "enable UTF-8" in scripts.
Hmmm? At 5.8.4 (and I don't remember it being different in recent
versions before that) it says [this'll need monospace display, and go
sadly wrong with these newfangled usenet-ish interfaces, sorry]:
As a compatibility measure, the use utf8 pragma must be explicitly
included to enable recognition of UTF-8 in the Perl scripts
^^^^^^^^^^^^^^^^^^^
themselves (in string or regular expression literals, or in
^^^^^^^^^^
identifier names) on ASCII-based machines or to recognize UTF-EBCDIC
on EBCDIC-based machines. These are the only times when an explicit
^^^^^^^^^^
use utf8 is needed.
> However, 'man perlrun' says the -CSD handles the IO,
Indeed, and (fwiw) I don't see anything there about encoding of the
script's source code itself.
> and perlunicode says for script encoding, see encoding
> which says that UTF-8 already works in scripts.
It "works", yes, but (as I understand it, anyway) I think you have to
ask for it. It could just be that if you call for locale-awareness
with -CL, and you have utf-8 in your locale, it will come out in the
wash; but I don't see any harm in asking for it directly, if you're so
certain that you'll never not want it (sorry for the double-negative).
> So, things are a little unclear. I put in both,
Looks as if you're (a) right and (b) unlikely to cause any harm.
> was able to read UTF-8 text, put it in a DBM hash, and
> get it back out. That's good enough for now.
Good luck
- Next message: Abigail: "Re: convention regarding lexical filehandles"
- Previous message: Martin Kissner: "Re: Adding a delimiter inbetween number characters and letter characters"
- In reply to: Wes Groleau: "Re: utf-8, was Re: Three questions: UTF-8, DBM, hash of lists, ..."
- Next in thread: Wes Groleau: "perl 5.8 bug ? (was Re: utf-8, was Re: Three questions: ....)"
- Reply: Wes Groleau: "perl 5.8 bug ? (was Re: utf-8, was Re: Three questions: ....)"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|