Re: XML Parsing Problems with SAX xerces



On Mon, 26 Sep 2005 22:19:23 -0500, "John C. Bollinger"
<jobollin@xxxxxxxxxxx> wrote or quoted :

>Given UTF-8's status as the default encoding, any utility that does not
>support that encoding is handicapped to the point of being downright
>broken. I know of none such, and never expect to see any. With that
>being the case it is safe to encode any XML document you create in
>UTF-8; any service or utility that fails to read it on account of the
>encoding has been designed specifically to prevent you from feeding it a
>document of your own creation. (So why fight it?)

But the problem is if you let people encode in CP278 (Scandinavian
EBCDIC) you force any reader of that file to support obsolete baggage
as well.

There was no advantage in allowing anything but UTF-8 and perhaps
UTF-16 If people want to write such files for internal purposes that
is their business, but they have no business being passed around as
interchange files.

Java has to support all these old encodings to deal with legacy apps,
but XML does not.

The other thing, embedding the encoding in plain text is a bit of a
chicken and egg problem. You have to know the encoding to interpret
the encoding specification. Unicode has the advantage you can tell
what you have got just examining the first few bytes.

Remember Bill the Cat from Bloom County? I think this decision
deserves one of his hair ball spitting up noises.
--
Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.
.



Relevant Pages

  • Re: How do I tell Word 2003 my mail-merge text is NOT UTF-8?
    ... supported by the tool in which the programs are developed (Borland Developer ... the data source come from a variety of places. ... of OpenDataSourcedoes not have an encoding parameter as Documents.Open ... If you are using Word 2000, that can't work because it doesn't support OLE ...
    (microsoft.public.word.mailmerge.fields)
  • Unicode text editor mined 2000 release 14
    ... Mined provides both extensive Unicode and CJK support offering many ... New command Alt-x toggles preceding character and its hexadecimal code. ... just determines and displays terminal encoding. ... supporting wide range of terminals ...
    (comp.editors)
  • Unicode text editor mined 2000 release 14
    ... Mined provides both extensive Unicode and CJK support offering many ... New command Alt-x toggles preceding character and its hexadecimal code. ... just determines and displays terminal encoding. ... supporting wide range of terminals ...
    (de.comp.editoren)
  • Re: [RFC PATCH 0/4] Implementation of IR support using the input subsystem
    ... Second pass at implementing evdev support for IR. ... Encoders and decoders have not been written for all protocols. ... For example thirty different vendors may use the NEC encoding. ...
    (Linux-Kernel)
  • mined: Unicode text editor back for minix?
    ... Mined provides both extensive Unicode and CJK support offering many ... specific features and covering special cases that other editors ... of terminal variations, or Han character information). ... Versatile character encoding support ...
    (comp.os.minix)