Re: Mysterious xml.sax Encoding Exception



On Feb 4, 4:09 pm, John Machin <sjmac...@xxxxxxxxxxx> wrote:
On Feb 5, 9:02 am, JKPeck <JKP...@xxxxxxxxx> wrote:



On Feb 2, 12:56 am, Jeroen Ruigrok van der Werven <asmo...@in-

nomine.org> wrote:
-On [20080201 19:06], JKPeck (JKP...@xxxxxxxxx) wrote:

In both of these cases, there are only plain, 7-bit ascii characters
in the xml, and it really is valid utf-16 as far as I can tell.

Did you mean to say that the only characters they used in the UTF-16 encoded
file are characters from the Basic Latin Unicode block?

It appears that the root cause of this problem is indeed passing a
Unicode XML string to xml.sax.parseString with an encoding declaration
in the XML of utf-16. This works with the standard distribution on
Windows.

It did NOT work for me with the standard 2.5.1 Windows distribution --
see the code + output that I posted.

It does not work with ActiveState on Windows even though
both distributions report
64K for sys.maxunicode.

So I don't know why the results are different, but the problem is
solved by encoding the Unicode string into utf-16 before passing it to
the parser.

Interesting. In the course of installing and testing with
ActiveState, I upgraded from the standard distribution 2.5.0 to
2.5.1. The former worked; the latter does not (with the original
code). So that ..1 seems to matter here, and that probably accounts
for why ActiveState raised the exception and the standard 2.5.0 did
not.

-Jon
.



Relevant Pages

  • Re: Mysterious xml.sax Encoding Exception
    ... and it really is valid utf-16 as far as I can tell. ... Unicode XML string to xml.sax.parseString with an encoding declaration ... It did NOT work for me with the standard 2.5.1 Windows distribution -- ...
    (comp.lang.python)
  • Re: Mysterious xml.sax Encoding Exception
    ... Did you mean to say that the only characters they used in the UTF-16 encoded ... Unicode XML string to xml.sax.parseString with an encoding declaration ... It does not work with ActiveState on Windows even though ...
    (comp.lang.python)
  • RE: Mysterious xml.sax Encoding Exception
    ... Consider the same XML as a Python Unicode string, so it is actually encoded as utf-16 and as a string containing utf-16 bytes. ...
    (comp.lang.python)
  • Re: unicode in ruby
    ... doesn't support unicode strings natively? ... (When Unix filesystems can write UTF-16 as ... to use decomposed characters instead of composed characters (e.g., ... even compress repetitive text which no encoding can. ...
    (comp.lang.ruby)
  • Re: unicode in ruby
    ... doesn't support unicode strings natively? ... put on Unix ages ago. ... (When Unix filesystems can write UTF-16 as ... translate to UTF-8 and/or follow the nonsensical POSIX rules for native ...
    (comp.lang.ruby)