Re: Strange 'Â' character output when using simplexml load string



On 25 Feb, 10:56, Toby A Inkster <usenet200...@xxxxxxxxxxxxxxxxx>
wrote:
Andy Hassall wrote:
bizt <bissa...@xxxxxxxxxxx> wrote:

I converting an XML string using simplexml_load_string function. It is
giving me a  character for some reason dotted around the text.

 simplexml always outputs in UTF-8. Is your page's encoding UTF-8?

At a guess, ISO-8859-1 or perhaps ISO-8859-15.

In UTF-8, a "prefix" of an 0xC2 byte is used to access the top half of the
"Latin-1 Supplement" block which includes a lot of juicy characters such
as currency symbols, fractions, superscript 2 and 3, the copyright and
registered trademark symbols, and the non-breaking space.

However in ISO-8859-1 and -15, the byte 0xC2 represents an Â, so if UTF-8
is misinterpreted as one of those, then you get  followed by some other
nonsense character.

Probably the easiest solution would be to take the output from SimpleXML
and pass it through iconv():

        $xmlout = iconv('UTF-8', 'ISO-8859-15//TRANSLIT', $xmlout);

Note that UTF-8 is capable of representing a far greater range of
characters than ISO-8859-1/-15 are, so certain characters may not properly
survive conversion. (Using the '//TRANSLIT' option tells iconv to do its
best, and if, say, a particular accented character is not available in
ISO-8859-1, then to substitute an unaccented one in its place.)

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 26 days, 15:55.]

                               Bottled Water
         http://tobyinkster.co.uk/blog/2008/02/18/bottled-water/


Hi, ive tried what you said which worked for one of my pages but when
i tried it on another i got the following:

Notice: iconv() [function.iconv]: Detected an illegal character in
input string in /home/public_html/search_apartments.php on line 67

Im using the following to convert my XML string which is fetched via
cUrl:

$result = iconv('UTF-8', 'ISO-8859-15//TRANSLIT', $result);

Would it be the case that my $result string, im not providing the
iconv() with the correct input encoding? If so, is there a way for me
to detect the input encoding?

Cheers

Martyn
.



Relevant Pages

  • Re: Unicode Support
    ... > Not knowing much about UTF-8 (my Unicode knowledge extends as far as ... > literal strings of this form as long as the character code for quote ... > can never appear in a MBCS (multibyte character sequence). ... then XP Notepad directly understands UNICODE and you can ...
    (alt.lang.asm)
  • Re: Attention: European C/C++/C#/Java Programmers-Call for Input
    ... Simply make a straight decision now - you will use UTF-8. ... character format) much like UTF-8 which itself ... I would have little more than UNICODE left. ... generator is assembly language. ...
    (comp.arch.embedded)
  • Re: Attention: European C/C++/C#/Java Programmers-Call for Input
    ... No other encodings - no Latin-1, no UTF-16, no home-made character sets, no extra fonts. ... Look at existing tools and source code that supports UTF-8, and see how it can make your work easier and give a result that users might actually be able to *use*. ... A couple of days work here is a drop in the ocean compared to the man-years it will take to work with your home-made encoding, and you will at least have the benefit of a better understanding of your problem. ... I would have little more than UNICODE left. ...
    (comp.arch.embedded)
  • Re: Loading a data file containing character fields with different encodings
    ... UTF-8 characters along with Latin-1 characters. ... One containing the latin-1 character set column, the second containing the utf-8 column and of course both files containing the primary key information. ... it would be just as easy to write the loader script that converts the encoding to a "unicode" intermediate format and then load with the correct database encoding. ... This caused that no conversion was done, but you were puting CP1252 characters into an 819 database! ...
    (comp.databases.informix)
  • [PATCH] console UTF-8 fixes
    ... I send a patch to the UTF-8 part of the vt driver. ... If a certain character is not found in the glyph ... characters) is to simply display the glyph loaded in that position. ...
    (Linux-Kernel)