Re: How to search HUGE XML with DOM?
- From: "Paul Boddie" <paul@xxxxxxxxxxxxx>
- Date: 31 Mar 2006 05:50:31 -0800
Diez B. Roggisch wrote:
the xml.dom.minidom object is too slow when parsing such a big XML file
to a DOM object. while pulldom should spend quite a long time going
through the whole database file. How to enhance the searching speed?
Are there existing solution or algorithm? Thank you for your
suggetion...
I've told you that before, and I tell you again: RDBMS is the way to go.
We've lost some context from the original post that may be relevant
here, but if populating what the original questioner calls "the
database" is an infrequent operation, then an RDBMS probably is the way
to go, in general. On the other hand, if a lot of parsing has to happen
in order to perform a search, such parsing would probably incur a lot
of overhead from SQL inserts that wouldn't be particularly desirable.
There might be XML-parsers that work faster - I suppose cElementTree can
gain you some speed - but ultimately the problems are inherent in the
representation as DOM: no type-information, no indices, no nothing. Just a
huge pile of nodes in memory.
Well, I would hope that W3C DOM operations like getElementById would be
supported by some index in the implementation: that would make some of
the searches mentioned by the questioner fairly rapid, given enough
memory.
So all searches are linear in the number of nodes. Of course you might be
able to create indices yourself, even devise a clever scheme to make using
them as declarative as possible. But that would in the end mean nothing but
re-creating RDBMS technology - why do that, if it's already there?
I agree that careful usage of RDBMS technology would solve the general
problems of searching large amounts of data, but the stated queries
should involve indexes and be fairly quick.
Paul
.
- References:
- How to search HUGE XML with DOM?
- From: Sullivan WxPyQtKinter
- Re: How to search HUGE XML with DOM?
- From: Diez B. Roggisch
- How to search HUGE XML with DOM?
- Prev by Date: Re: re.sub problem
- Next by Date: Re: How to debug python code?
- Previous by thread: Re: How to search HUGE XML with DOM?
- Next by thread: Re: How to search HUGE XML with DOM?
- Index(es):