Re: Strange 'Â' character output when using simplexml load string
- From: bizt <bissatch@xxxxxxxxxxx>
- Date: Mon, 7 Apr 2008 06:19:28 -0700 (PDT)
On 25 Feb, 10:56, Toby A Inkster <usenet200...@xxxxxxxxxxxxxxxxx>
Andy Hassall wrote:
bizt <bissa...@xxxxxxxxxxx> wrote:
I converting an XML string using simplexml_load_string function. It is
giving me a Â character for some reason dotted around the text.
simplexml always outputs in UTF-8. Is your page's encoding UTF-8?
At a guess, ISO-8859-1 or perhaps ISO-8859-15.
In UTF-8, a "prefix" of an 0xC2 byte is used to access the top half of the
"Latin-1 Supplement" block which includes a lot of juicy characters such
as currency symbols, fractions, superscript 2 and 3, the copyright and
registered trademark symbols, and the non-breaking space.
However in ISO-8859-1 and -15, the byte 0xC2 represents an Â, so if UTF-8
is misinterpreted as one of those, then you get Â followed by some other
Probably the easiest solution would be to take the output from SimpleXML
and pass it through iconv():
$xmlout = iconv('UTF-8', 'ISO-8859-15//TRANSLIT', $xmlout);
Note that UTF-8 is capable of representing a far greater range of
characters than ISO-8859-1/-15 are, so certain characters may not properly
survive conversion. (Using the '//TRANSLIT' option tells iconv to do its
best, and if, say, a particular accented character is not available in
ISO-8859-1, then to substitute an unaccented one in its place.)
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 188.8.131.52-mm-desktop-9mdvsmp, up 26 days, 15:55.]
Hi, ive tried what you said which worked for one of my pages but when
i tried it on another i got the following:
Notice: iconv() [function.iconv]: Detected an illegal character in
input string in /home/public_html/search_apartments.php on line 67
Im using the following to convert my XML string which is fetched via
$result = iconv('UTF-8', 'ISO-8859-15//TRANSLIT', $result);
Would it be the case that my $result string, im not providing the
iconv() with the correct input encoding? If so, is there a way for me
to detect the input encoding?