Foreign XML files, their encoding and how to deal with it?
Hello,
I try to process xml files downloaded from www.archive.org, describing
e.g. NetLabel releases.
On of those files - a ..._meta.xml file looks this way:
<?xml version="1.0" encoding="UTF-8"?>
<metadata><identifier>fri012</identifier>
<title>Hogar</title>
<creator>El sueño de la casa propia</creator>
...
</metadata>
The Creator should be "El sueño de la casa propia", but how to come
from "sueño" to "sueñno"?
I uses tdom to parse those XML files and tried to apply an "encoding
convertto ascii" or "encoding convertfrom utf-8", but this all does
not work.
The "ñ" would be in utf-8 "ñ", so what's about the other 2 characters
"ƒÂ" inbetween?
Since the XML files don't have this byte ordering mark, I can not
detect myself what encoding is really used.
Can someone please help me out?
Best regards,
Martin Lemburg
.
Relevant Pages
- Re: C# and XmlTextWriter sends some garbage.
... I usually start my XML files with: ... or whatever encoding you want to use... ... the files with the garbage display fine as raw files in IE on ... > Windows and in FireFox on Mac, but IE on Mac chokes on the file, giving ... (microsoft.public.dotnet.languages.csharp) - Re: Character Set Problem?
... I ran into something similar myself when I created XML files from VBA. ... was no problem until I created a test file with accented characters, ... so the actual encoding and the declaration did not match. ... (microsoft.public.access.modulesdaovba) - Re: Please help!! SAXParseException: not well-formed (invalid token)
... of XML files in this python application I'm working on. ... smart/fancy/curly quotes and other seemingly harmless ... unless you have a wrong encoding. ... XML-Document isn't a XML-document at all. ... (comp.lang.python) - Lookup encoding aliases in Python 2.4
... My program creates new XML files (not through the DOM, ... be in the default system encoding. ... Python 2.4's codecs.lookup works differently--it does not return names. ... through the codecs module in 2.5 to see if I could find a dict with the ... (comp.lang.python) - Re: Xml parser and character encoding
... I am new to java and I run a short program processing xml files. ... I thought java would handle it but unexpectedly, it handles it under DOS but doesn't handle it under Linux! ... The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. ... In other words, you're not specifying the encoding in the reader, and so it picks some arbitrary one, and that encoding doesn't match the encoding used in your XML file. ... (comp.lang.java.programmer) |
|