Re: [XML::Simple-2.12] problems parsing non ASCII strings
- From: Jul <REMjulienpleeOVE@xxxxxxxxxxxxx>
- Date: Tue, 12 Jul 2005 23:43:16 +0000
Le Tue, 12 Jul 2005 19:16:53 +0200, Michel Rodriguez a écrit :
> Jul wrote:
>> module: XML::Simple-2.12 (also tried 2.14)
>> perl version: 5.00503
>
> Wahouh! Do you know how old this is? 5, 6 years old?
I know it's very very old, that's why I mentionned it, I'm looking for a
way to trick it, like I did for other perl5.6 modules used :o)
I guess we can sometimes rename "hosting solutions" to "hosting problems",
but it would be less attractive to the custommer ;-)
>> I need to parse and write a XML configuration file wich contains
>> non-ASCII caraters (like 'é', in french). I've choosen, XML::Simple
>> with XML::Parser for these tasks, but everything works fine if and only
>> if I do not include any special carater in the file, otherwise the HASH
>> returned by XMLin() is totaly messed up.
>
> What is the encoding of your file? My guess is that it is in either
> ISO-8859-1 (or -15) or some kind of windows-12nn
>
> What happens is that the data is read, probably by expat, and converted
> to UTF-8. The "totaly messed up" characters are in fact perfectly valid
> UTF-8 characters, that your terminal (or whatever you use to display
> them) is not set to display.
>
> If XML::Simple can read it then the encoding must be declared in the XML
> declaration, at the beginning of the XML file.
The default encoding protocol should be ISO-8859-1 or -15, that's why I
expected to retreive the same encoding type.
With the encoding attribute set in the declaration, it goes better, yo'ure
right, and I've been surprised to see that UTF-8 is also supported, even
with perl 5.005 :-)
> Your choices are either to convert those characters back to the original
> encoding, look at the Unicode::* modules on CPAN, or to bite the Unicode
> bullet and learn how to work with UTF-8 data. In the long run the second
> option makes more sense, but YMMV.
Now, the original caracter is displayed as ISO-8859-15, but coded
with UTF-8. You're right again! lol
At this time, I wonder wether UTF-8 is the default carset or wether there
is an option available for XML::Simple or XML::Parser. I took a look into
those modules documentation but didn't get much.
Otherwise, I'll try to convert data outside XML::Simple.
> But really, processing XML with perl 5.00503 seems like a bad idea to me.
I agree with you, but I have no choice right now. I got perl 5.005 in one
hand and a project to rise on the other. Here is what I have to deal with.
Maybe another way to parse a configuration file would be easier, but I
like the idea to have a reason to play with XML, and I didn't really found
what I want with the modules previously tested.
Thank you very much for your help, it's been really usefull to me.
Julien
.
- Follow-Ups:
- Re: [XML::Simple-2.12] problems parsing non ASCII strings
- From: Michel Rodriguez
- Re: [XML::Simple-2.12] problems parsing non ASCII strings
- References:
- [XML::Simple-2.12] problems parsing non ASCII strings
- From: Jul
- Re: [XML::Simple-2.12] problems parsing non ASCII strings
- From: Michel Rodriguez
- [XML::Simple-2.12] problems parsing non ASCII strings
- Prev by Date: [XML::Simple-2.12] problems parsing non ASCII strings
- Next by Date: FindBin cannot chdir back problem
- Previous by thread: Re: [XML::Simple-2.12] problems parsing non ASCII strings
- Next by thread: Re: [XML::Simple-2.12] problems parsing non ASCII strings
- Index(es):
Relevant Pages
|