Re: UTF-8 encoding problem



shreshth.luthra@xxxxxxxxx wrote:
Hi All,

Now, i have 2 XML files both of them saved in UTF-8 format, having
characters of different language.

Although both of them are having UTF-8 as BoM, but only first file is
having UTF-8 defined in XML declration at the top of the XML file as
well.

Now, when i search for some different langauge character in that
directory using a third party GUI for desktop search, it shows that the
charcter exist in the first file (in which XML declation was also
there), but not in the second file (having only BoM)

Is your only problem that the third party software doesn't find the characters
? If so then it sounds as if that software is buggy. XML readers (according
the XML spec) are required to be able to read UTF-8 and UTF-16; are required to
be able to use the BOM (if any) to distinguish between UTF-8 and UTF-16 in the
absence of an explicit declaration[*]; and are required to assume UTF-8 if
there is no BOM and no declaration.

It might be interesting to see if the third party software works if you remove
the BOM too. But that wouldn't help much unless you are searching for a
workaround for the third-party's bugs.

-- chris

([*] "explicit declaration" includes the encoding="xyz" part of the <?xml...
declaration, and also includes any information provided by the transport
layer -- but in this case there isn't a transport layer...)


.



Relevant Pages