Re: How to search HUGE XML with DOM?



Sullivan WxPyQtKinter wrote:
a relation database has admiring search efficiency when the database is
very big (several thousands or tens of thousands of records). But my
current project is based on XML, for its tree-like data structure has
much more flexibility; and DOM, which could be manipulated just like a
tree. However, how to establish such a XML data base for search when it
contains 10,000 records (One record usually contain 10~30 tags) or
more?

My search needs:
1. Search and return all the record (an element) with specific id.
2. Search and return all the record whose child nodes has a specific id
or attribute.

the xml.dom.minidom object is too slow when parsing such a big XML file
to a DOM object. while pulldom should spend quite a long time going
through the whole database file. How to enhance the searching speed?
Are there existing solution or algorithm? Thank you for your
suggetion...

- have a look at cElementTree ?
- store your XML as persistant objects in a ZODB instance, then use ZODB
catalog for queries ?
- index relevant data in a DB (RDBMS, Berkeley, whatever...) ?
- have a look at 4suite (http://4suite.org/index.xhtml) ?

My 2 cents...
--
bruno desthuilliers
python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
p in 'onurb@xxxxxxxxxxx'.split('@')])"
.



Relevant Pages

  • Re: xml in plain text file on heavy load.
    ... even if the XML file is magically and perfectly ... I want to emphasize that IIS would never be caching that XML file on its own ... Application is NOT synchronizing access to your ASP pages. ...
    (microsoft.public.inetserver.iis)
  • Re: TAPI 3.0 call attached data
    ... The format of the Call Attached Data is XML. ... XML file with CallAttchedData represents one or multiple data lists ... Root element is CallAttachedData. ... version CDATA #FIXED "1.0" ...
    (microsoft.public.win32.programmer.tapi)
  • Re: XML parser and writer
    ... them on a calendar. ... Therefore I will need to both easily parse and write new XML files. ... why not some database technology? ... an advanced user can edit the XML file directly at ...
    (comp.lang.java.programmer)
  • Re: Zooming Out: The Larger Issue
    ... XML file or a binary file (smaller and faster to serialize deserialize, ... On startup of your app you could just check if the xml / binary file exists ... datarow = the container of one or more datacolumns wich in there turn hold ...
    (microsoft.public.dotnet.languages.vb)
  • Re: XmlTextReader or XmlDocument or SQLCE
    ... I wouldn't really call it "unrealistic expectations", ... I would love to see a 1 second response time ... reading a single node from a 200k XML file on any version of CF running on ... > As to saving XML file after each change, ...
    (microsoft.public.dotnet.framework.compactframework)