Re: XML::Simple and utf8 woes
- From: Dennis Roesler <noone@xxxxxxxxxxx>
- Date: Mon, 27 Mar 2006 18:31:33 -0500
corff@xxxxxxxxxxxxxxxxxx wrote:
If, in a fit of desperation, I modify the output of XMLout() with
NumericEscape=>2, all I get in the output is that, eg. a umlaut of
Morgendämmerung (sorry for this encoding-independet symbolic
notation here!) is represented as ä which happens to be the
decimal values of the two octets comprising U+00e4, or Latin small a
with umlaut.
I've been following this thread because I have been struggling with XML::Simple writing/sourcing an XML file in cp932 encoding. The NumericEscape is what resolved the writing and setting the encoding in the xml declaration of the cp932 encoded file to x-sjis-cp932 so XML::Simple would source it properly took me awhile to figure out :-(.
#!/usr/bin/perl
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
use Encode qw(:all);
my $file = $ARGV[0];
my $outfile = "cp932out.xml";
open my $utf8in, "<:encoding(utf8)", $file or die "In $file: $!";
open my $cp932out, ">:encoding(cp932)", $outfile or die "Out $outfile: $!";
my $utf8So = XMLin($utf8in, KeepRoot => 1, ForceArray => 1, SuppressEmpty => undef);
print Dumper($utf8So);
XMLout($utf8So, OutputFile => $cp932out,
AttrIndent => 1, KeepRoot => 1,
NumericEscape => 1,
XMLDecl => "<?xml version='1.0' encoding='x-sjis-cp932'?>");
close $utf8in;
close $cp932out;
open my $cp932in, "<:encoding(cp932)", "cp932out.xml" or die "XML In $outfile: $!";
my $cp932So = XMLin($cp932in, ForceArray => ['Line_Items'], SuppressEmpty => undef);
print Dumper($cp932So);
Without the NumericEscape in the XMLout I get the following error when writing the cp932 encoded data.
not well-formed (invalid token) at line 75, column 41, byte 3001 at /opt/perl/lib/site_perl/5.8.0/PA-RISC1.1-thread-multi/XML/Parser.pm line 185
My first attempt was to just use IO layers.
open my $utf8in, "<:encoding(utf8)", $file or die "In $file: $!";
open my $cp932out, ">:encoding(cp932)", $outfile or die "Out $outfile: $!";
my $fline = <$utf8in>;
print $cp932out qq~<?xml version='1.0' encoding='x-sjis-cp932'?>~;
while (<$utf8in>) { print $cp932out $_; }
open my $cp932in, "<:encoding(cp932)", "cp932out.xml" or die "XML In $outfile: $!";
my $cp932So = XMLin($cp932in, ForceArray => ['Line_Items'], SuppressEmpty => undef);
print Dumper($cp932So);
This results in:
not well-formed (invalid token) at line 37, column 35, byte 886 at /opt/perl/lib/site_perl/5.8.0/PA-RISC1.1-thread-multi/XML/Parser.pm line 185
Cheers
Dennis
.
- Follow-Ups:
- Re: XML::Simple and utf8 woes
- From: corff
- Re: XML::Simple and utf8 woes
- References:
- XML::Simple and utf8 woes
- From: corff
- Re: XML::Simple and utf8 woes
- From: Chronos Tachyon
- Re: XML::Simple and utf8 woes
- From: corff
- Re: XML::Simple and utf8 woes
- From: Donald King
- Re: XML::Simple and utf8 woes
- From: corff
- XML::Simple and utf8 woes
- Prev by Date: Re: Pstree - how to start?
- Next by Date: Re: Passing user data between scripts
- Previous by thread: Re: XML::Simple and utf8 woes
- Next by thread: Re: XML::Simple and utf8 woes
- Index(es):
Relevant Pages
|