Re: tDOM doesn't support encoding='ASCII'?
- From: pointsman@xxxxxxx (Rolf Ade)
- Date: Fri, 11 Jan 2008 12:54:06 +0000 (UTC)
Neil Madden wrote:
I think the point as far as tdom is concerned is that if it came through
a Tcl channel (and if not, where else did it come from?) then Tcl will
have already converted it to UTF-8 on the way in (unless you
specifically asked for binary encoding), so any XML encoding declaration
is most probably wrong at this stage. So tDOM is absolutely doing the
right thing here in requiring you to remove any erroneous xml encoding
declaration. The file may have started off as ASCII (or some other
encoding), but when tdom sees it it is almost certainly UTF-8.
So, if you know what encoding your files are in then [fconfigure
-encoding] the channel when you read the file and then strip the xml
declaration (or remove the encoding part anyway). If you don't know the
encoding then use tDOM::xmlReadFile which will do the right thing in
terms of figuring out the correct encoding to use (following the XML specs).
Neil sums it up pretty well. That's a lot of the rationale, why I
implemented it the way, it is. And I don't think, that's "bizarre" at
all.
There are even more details, fine points and considerations (even
'down on earth' ones, like historical reasons how the interface
evolved). But going into that probably confuse the topic even more.
It works, as it works now since years (at least around 5 years) and
that's the result of at lot of musing and tinkering around.
Fact seems to be, that this topic comes up on and off. One problem is,
that some things add confusion, which are not really in tdoms
basket. Examples of this:
Tcl channel didn't know something about BOMs. They plain just handle
them (but just hand them throu).
In some areas (no offence folks, but AOLServer people seem to be
notorious, here) there still seem to be pre 8.1 binary extensions in
usage. Which means, that the parser sees some Tcl_Obj string reps,
which are in fact not in utf-8.
And others. Not to talk about, that the topic raises his head again,
if you want to write a XML serialization w/ XML declaration with
encoding info.
But in the end, tdom is a tool for _programmers_. A tcl programmer
must have a basic unterstanding of how tcl handles i18n (or he will
run into problems on the long run). If a programmer has to handle some
data format (nothing else is XML), he must have a basic understanding
of that format. In case of XML one essential point of that is how XML
handles i18n. Up to now, I wasn't able to come up with a rmmadwim
solution for the problem, we discuss here.
That all said, there's always room for improvement. I'm open to
listen. But don't expect, that you hit the nail after 30 seconds of
thinking.
rolf
.
- Follow-Ups:
- Re: tDOM doesn't support encoding='ASCII'?
- From: David Gravereaux
- Re: tDOM doesn't support encoding='ASCII'?
- References:
- tDOM doesn't support encoding='ASCII'?
- From: Jon . Stinzel
- Re: tDOM doesn't support encoding='ASCII'?
- From: David Gravereaux
- Re: tDOM doesn't support encoding='ASCII'?
- From: tom.rmadilo
- Re: tDOM doesn't support encoding='ASCII'?
- From: Neil Madden
- tDOM doesn't support encoding='ASCII'?
- Prev by Date: Re: Getting directory name
- Next by Date: Re: New TCL book foor 8.5 ?
- Previous by thread: Re: tDOM doesn't support encoding='ASCII'?
- Next by thread: Re: tDOM doesn't support encoding='ASCII'?
- Index(es):
Relevant Pages
|