Re: Problem round-tripping with xml.dom.minidom pretty-printer
- From: Robert Bossy <Robert.Bossy@xxxxxxxxxxxx>
- Date: Fri, 29 Feb 2008 18:50:31 +0100
Ben Butler-Cole wrote:
minidom --any DOM parser, btw-- has no means to know which blank character is a pretty print artefact or actual blank content from the original XML.An additional thing to keep in mind is that toprettyxml does not print
an XML identical to the original DOM tree: it adds newlines and tabs.
When parsed again these blank characters are inserted in the DOM tree as
character nodes. If you toprettyxml an XML document twice in a row, then
the second one will also add newlines and tabs around the newlines and
tabs added by the first. Since you call toprettyxml an infinite number
of times, it is expected that lots of blank characters appear.
Right. That's the behaviour I'm asking about, which I consider to be
problematic. I would expect a module providing a parser and pretty-
printer (not just for XML parsers) to be able to conservatively round-
trip.
As far as I can see (and your comments back this up) minidom doesn't
have this property. Unless anyone knows how to get it to behave that
way...
You could write a function that strips all-blank nodes recursively down the elements tree, before doing so I suggest you take a look at section 2.10 of http://www.w3.org/TR/REC-xml/.
RB
.
- References:
- Problem round-tripping with xml.dom.minidom pretty-printer
- From: Ben Butler-Cole
- Re: Problem round-tripping with xml.dom.minidom pretty-printer
- From: Robert Bossy
- Re: Problem round-tripping with xml.dom.minidom pretty-printer
- From: Ben Butler-Cole
- Problem round-tripping with xml.dom.minidom pretty-printer
- Prev by Date: Re: Indentation and optional delimiters
- Next by Date: Re: feedback requested
- Previous by thread: Re: Problem round-tripping with xml.dom.minidom pretty-printer
- Index(es):
Relevant Pages
|