Re: Mysterious xml.sax Encoding Exception



On Feb 5, 9:02 am, JKPeck <JKP...@xxxxxxxxx> wrote:
On Feb 2, 12:56 am, Jeroen Ruigrok van der Werven <asmo...@in-

nomine.org> wrote:
-On [20080201 19:06], JKPeck (JKP...@xxxxxxxxx) wrote:

In both of these cases, there are only plain, 7-bit ascii characters
in the xml, and it really is valid utf-16 as far as I can tell.

Did you mean to say that the only characters they used in the UTF-16 encoded
file are characters from the Basic Latin Unicode block?


It appears that the root cause of this problem is indeed passing a
Unicode XML string to xml.sax.parseString with an encoding declaration
in the XML of utf-16. This works with the standard distribution on
Windows.

It did NOT work for me with the standard 2.5.1 Windows distribution --
see the code + output that I posted.

It does not work with ActiveState on Windows even though
both distributions report
64K for sys.maxunicode.

So I don't know why the results are different, but the problem is
solved by encoding the Unicode string into utf-16 before passing it to
the parser.
.



Relevant Pages

  • Re: Mysterious xml.sax Encoding Exception
    ... and it really is valid utf-16 as far as I can tell. ... Unicode XML string to xml.sax.parseString with an encoding declaration ... I upgraded from the standard distribution 2.5.0 to ...
    (comp.lang.python)
  • Re: Mysterious xml.sax Encoding Exception
    ... Did you mean to say that the only characters they used in the UTF-16 encoded ... Unicode XML string to xml.sax.parseString with an encoding declaration ... It does not work with ActiveState on Windows even though ...
    (comp.lang.python)
  • RE: Mysterious xml.sax Encoding Exception
    ... Consider the same XML as a Python Unicode string, so it is actually encoded as utf-16 and as a string containing utf-16 bytes. ...
    (comp.lang.python)
  • Re: unicode in ruby
    ... doesn't support unicode strings natively? ... (When Unix filesystems can write UTF-16 as ... to use decomposed characters instead of composed characters (e.g., ... even compress repetitive text which no encoding can. ...
    (comp.lang.ruby)
  • Re: unicode in ruby
    ... doesn't support unicode strings natively? ... put on Unix ages ago. ... (When Unix filesystems can write UTF-16 as ... translate to UTF-8 and/or follow the nonsensical POSIX rules for native ...
    (comp.lang.ruby)