Re: could XML::Simple handling chinese character?
- From: "Peter J. Holzer" <hjp-usenet2@xxxxxx>
- Date: Sun, 17 Jun 2007 19:03:01 +0200
On 2007-06-17 08:09, Mumia W. <paduille.4061.mumia.w+nospam@xxxxxxxxxxxxx> wrote:
On 06/17/2007 01:10 AM, havel.zhang wrote:
hi mirod:
when i changed chinese character with english word, it works fine.
my versions of perl is 5.8.8 .
I also ran your program without problems on Perl 5.8.4 / Linux. You
should enable a utf8 locale on your computer and tell Perl to use that
encoding when reading from the file.
No, you should not (well, using a utf8 locale may be a good idea anyway,
but it doesn't have anything to do with his problem). Telling perl to
use a specific encoding when reading XML files is at best ineffectual,
or it may cause problems.
When I tested your program, I first saved part1.xml to a file in utf8
format;
Thus is obviously necessary as the XML file starts with
<?xml version="1.0" encoding="utf-8"?>
then I copied your script to a file in utf8 format.
The script doesn't contain any non-ASCII characters so there is no
difference between ASCII format, Latin-1 format, UTF-8 format, etc.
I also added the "encoding" pragma to tell Perl that the script was
written in utf8.
The script is pure ASCII. Of course that means it's UTF-8, too, but it's
also a dozen other charsets which are supersets of ASCII.
And my locale is currently set to utf8.
Irrelevant. XML files contain their own encoding. They *must* *not* be
read differently depending on the locale. If the XML declaration
contains encoding="utf-8", the file must be parsed as UTF-8, regardless
of the charset of the current locale. Since you can't know the encoding
of an XML file before parsing it, it is the responsibility of the XML
parser to determine the encoding.
So there's no way for Perl to be unprepared to deal with utf8 encoded
data on my system right now,
Nothing you described above "prepared your system to deal with utf8
encoded" XML files.
and Chinese characters should be stored in either utf8 or gb2312
files.
Or GB18030 or EUC-CN or whatever contains the necessary characters. It
is only necessary that the XML declaration matches the contents of the
file.
I suspect your problem is encoding confusion. Either you don't have a
suitable locale installed (e.g. utf8),
I don't think you can install perl 5.8.8 without support for UTF-8,
regardless of any system-specific locales.
or you stored the file in one encoding (e.g. gb2312), but you're
trying to read it in another encoding (utf8 ?).
The parser must read it in UTF-8 encoding since that's what the file
says it is. Your suspicion that the file really is in some other
encoding seems likely (especially since Havel posted in gb2312).
It's also possible that the parser used by XML::Simple is broken, but
judging from the error message it is XML::Parser which in turn uses
expat, so I think that's unlikely.
hp
--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hjp@xxxxxx |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"
.
- References:
- could XML::Simple handling chinese character?
- From: havel.zhang
- Re: could XML::Simple handling chinese character?
- From: mirod
- Re: could XML::Simple handling chinese character?
- From: havel.zhang
- Re: could XML::Simple handling chinese character?
- From: Mumia W.
- could XML::Simple handling chinese character?
- Prev by Date: Re: 12 hour clock and offset problem
- Next by Date: encrypted connection with Mail::Sender?
- Previous by thread: Re: could XML::Simple handling chinese character?
- Next by thread: Re: could XML::Simple handling chinese character?
- Index(es):