Re: Get € past a XML parser

From: Rutger Claes (news_at_rgc.be)
Date: 01/08/05


Date: Sat, 08 Jan 2005 10:50:29 +0100

Manuel Lemos wrote:

> Hello,
>
> on 01/07/2005 11:05 AM Rutger Claes said the following:
>> I'm having troubles getting the euro sign through an XML parser.
>>
>> With the following test code:
>> <?php
>> $string = "<root><test>&#8364;</test></root>";
>
> You need to explicitly declare that the output encoding is UTF-8 because
> ISO-8859-1 only comprises 8 bit latin characters. Iso-8859-15 would be
> the correct encoding but I don't think Expat supports any encoding
> besides UTF-8 or ISO-8859-1.
>

You're right. When I enforce UTF-8 on my xml from the time it get's out of
the DOM Object through the SAX parser and Tidy I get some wrong symbols:
â,¬. But when I tell my browser (Konqueror) to use charset UTF-8, it works.

The problem now is that even though I have a
<meta .... content-type: text/hml; charset=UTF-8" /> and a headers( '...
charset=UTF-8' ) the browser still doesn't pick it up when it is set to
auto charset. I've tried mozilla firefox too, same result.

So now I have a working charset, but nobody will see it. Is there a way to
fix this?

 Thanks for the answer,
 Rutger Claes

-- 
Rutger Claes                                                rgc@rgc.tld
Replace tld with top level domain of belgium to contact me    pgp:0x3B7D6BD6
Do not reply to the from address.   It's read by /dev/null and sa-learn only


Relevant Pages

  • Re: different encoding handling between old ASP and ASP.Net
    ... globalization support and configuration between ASP and ASP.NET. ... charset to utf-8. ... decode as utf-8 encoding. ... In ASP.NET, we don't need to set these, since ASP.NET bydefault use utf-8 ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: Unicode and html - help for simple web site
    ... Google, which is broken in many ways, thinks it is ISO-8859-5 ... > Worryingly, when auto charset recognition was turned off, the encoding ... > was reported as utf-8: but surely these strings of cp1251 bytes could ...
    (comp.infosystems.www.authoring.html)
  • Re: Input Character Set Handling
    ... "transmit verbatim over network". ... And for sure you have checked *what* charset is indicated in ... at the Encoding item in the menu for IE's ... UTF-8, and a hex dump of the bytes actually sent shows:- ...
    (comp.lang.javascript)
  • Re: Xahs Edu Corner: The Concepts and Confusions of Pre-fix, In-fix, Post-fix and Fully
    ... which is by default the platform encoding, not UTF-8. ... Agent uses this 'charset' to decode articles with no 'charset' declared. ...
    (comp.lang.python)
  • Re: automating the SQL warning and the choice of text format
    ... automatically select 'yes' and 'utf-8' rather than changing the registry, ... In order to get the correct encoding, I believe that you have to do the ... You need one of those for each data source. ... For a comma-delimited file using UTF-8 encoding, ...
    (microsoft.public.word.mailmerge.fields)