Re: XML::Simple XMLIn() and odd chars
- From: "Mumia W." <paduille.4061.mumia.w+nospam@xxxxxxxxxxxxx>
- Date: Tue, 05 Aug 2008 18:47:14 -0500
On 08/05/2008 03:37 PM, dr fence wrote:
I broke it down to a small example. I don't understand why I don't get the same output on both systems.
--------- <OUTPUT System A> ----------------
perl tiny_xml.pl tiny.xml
reading tiny.xml
And They're Off<br>2008
$VAR1 = {
'title' => "And They\x{e2}\x{80}\x{99}re Off<br>2008"
};
[A]$ cat tiny.txt
And They're Off<br>2008
[...]
\x{e2}\x{80}\x{99} seems to be unicode character \x{2019} (’) which is an alternate quote character (apostrophe?); the normal one is \x{27} ('). For some reason, your input file is getting the alternate character in it. When I copied and pasted your tiny.xml, I didn't get the alternate quote character. The copy of tiny.xml that I have base64-encodes to this:
PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iaXNvLTg4NTktMSI/Pgo8Ym9vaz4KICAgIDx0
aXRsZT5BbmQgVGhleSdyZSBPZmYmbHQ7YnImZ3Q7MjAwODwvdGl0bGU+CjwvYm9vaz4K
Also, the shebang line (first line) of your tiny_xml.pl script was wrong (missing a "!"). These files produce what I think is the right output on my system:
--------------file:tiny_xml.pl--------------
#!/usr/bin/perl
use XML::Simple;
use Data::Dumper;
use CGI qw/header/;
print header(
'-content-type' => 'text/plain',
-charset => 'iso-8859-1',
);
my $xs = XML::Simple->new();
my $filename = 'tiny.xml';
print "reading $filename\n";
my $xml = $xs->XMLin($filename);
print "$xml->{title}\n";
print Dumper($xml);
my $tinytxt = '/dev/shm/tiny.txt';
open RF, '>', $tinytxt or die("open failed: $!\n");
print RF "\$xml->{title}: $xml->{title}\n";
close RF;
chmod 0666, $tinytxt;
------------file:tiny.xml----------------
<?xml version="1.0" encoding="iso-8859-1"?>
<book>
<title>And They're Off<br>2008</title>
</book>
------------OUTPUT---------------------
HTTP/1.1 200 OK
Date: Tue, 05 Aug 2008 22:45:15 GMT
Server: Apache/2.2.3 (Debian) PHP/5.2.0-8+etch11 mod_perl/2.0.2 Perl/v5.8.8
Connection: close
Content-Type: text/plain; charset=iso-8859-1
reading tiny.xml
And They're Off<br>2008
$VAR1 = {
'title' => 'And They\'re Off<br>2008'
};
------------end--------------
So you seem to have two problems: \x{2019} appears where you don't want it to, and system B it not set up to display UTF-8 correctly; this is not a major problem since the output is supposed to go to a browser--not the console. Just make sure that the locale en_US.UTF-8 is enabled, and run the script from the webserver, and provide the proper HTTP header specifying the charset utf-8.
If you wish to run from an X-terminal for debugging purposes (like I do), then you'll need to set LANG=en_US.UTF-8 and start X under that. A terminal emulator that can handle unicode (such as urxvt) is also a good idea.
.
- Follow-Ups:
- Re: XML::Simple XMLIn() and odd chars
- From: Chris Rodgers
- Re: XML::Simple XMLIn() and odd chars
- From: Mumia W.
- Re: XML::Simple XMLIn() and odd chars
- References:
- XML::Simple XMLIn() and odd chars
- From: dr fence
- Re: XML::Simple XMLIn() and odd chars
- From: Mumia W.
- Re: XML::Simple XMLIn() and odd chars
- From: dr fence
- Re: XML::Simple XMLIn() and odd chars
- From: dr fence
- Re: XML::Simple XMLIn() and odd chars
- From: dr fence
- XML::Simple XMLIn() and odd chars
- Prev by Date: Question: How to use SAX::Machines to get back sub-document as XML
- Next by Date: Re: Using a DBI connection in many places (in the code)
- Previous by thread: Re: XML::Simple XMLIn() and odd chars
- Next by thread: Re: XML::Simple XMLIn() and odd chars
- Index(es):
Relevant Pages
|