Re: convert xhtml back to html



bryan rasmussen top-posted:
On Thu, Apr 24, 2008 at 9:55 PM, Stefan Behnel <stefan_ml@xxxxxxxxx> wrote:
from lxml import etree

tree = etree.parse("thefile.xhtml")
tree.write("thefile.html", method="html")

http://codespeak.net/lxml

wow, that's pretty nice there.

Just to know: what's the performance like on XML instances of 1 GB?

That's a pretty big file, although you didn't mention what kind of XML
language you want to handle and what you want to do with it.

lxml is pretty conservative in terms of memory:

http://blog.ianbicking.org/2008/03/30/python-html-parser-performance/

But the exact numbers depend on your data. lxml holds the XML tree in memory,
which is a lot bigger than the serialised data. So, for example, if you have
2GB of RAM and want to parse a serialised 1GB XML file full of little
one-element integers into an in-memory tree, get prepared for lunch. With a
lot of long text string content instead, it might still fit.

However, lxml also has a couple of step-by-step and stream parsing APIs:

http://codespeak.net/lxml/parsing.html#the-target-parser-interface
http://codespeak.net/lxml/parsing.html#the-feed-parser-interface
http://codespeak.net/lxml/parsing.html#iterparse-and-iterwalk

They might do what you want.

Stefan
.



Relevant Pages

  • Re: Modeling Data for XML instead of SQL-DBMS
    ... It is a tree, ... A set of XML documents may include ... DBMS tools where the link specifications are metadata. ... but I don't think that is the hierarchy you ...
    (comp.databases.theory)
  • Re: TreeView Dynamic XML Binding
    ... Subject: TreeView Dynamic XML Binding ... Expand the ReturnHeader node. ... When I click on the tree node for the select ... I'm simply trying to display the xml element tree ...
    (microsoft.public.dotnet.framework.aspnet.webcontrols)
  • Re: TreeView with Recursive TreeNode....
    ... > head that he was trying to output a tree in html, ... The xml would possibly give you an advantage in already ... > "Mark Broadbent" wrote in message ... >> for mirroring this in a TreeView and this is probably correct. ...
    (microsoft.public.dotnet.framework.windowsforms)
  • Re: Three Kinds of Logical Trees
    ... >>> That strikes me as a nonstardard definition of the use of metadata, ... >> metadata & data from an RDBMS into an xml dom tree. ... I think this is just a representation issue. ...
    (comp.databases.theory)
  • TreeView Dynamic XML Binding
    ... I'm dynamically binding an XmlDataSource to an xml file in the Page_Load event. ... When I click on the tree node for the select processing, a null reference error is displayed. ... Exception Details: System.NullReferenceException: Object reference not set to an instance of an object. ...
    (microsoft.public.dotnet.framework.aspnet.webcontrols)