Re: xml + mmap cross
- From: castironpi <castironpi@xxxxxxxxx>
- Date: Thu, 4 Sep 2008 20:28:34 -0700 (PDT)
On Sep 4, 7:54 pm, alex23 <wuwe...@xxxxxxxxx> wrote:
On Sep 4, 8:31 am, castironpi <castiro...@xxxxxxxxx> wrote:
Any interest in pursuing/developing/working together on a mmaped-xml
class? Faster, not readable in text editor.
XML is text-based, so it should -always- be readable in a text editor.
It's part of the definition, I believe.
However, an implementation of one of the alternative binary XML
formats would probably be very welcome.
Fast Infoset:http://www.itu.int/rec/T-REC-X.891-200505-I/en
EXI:http://www.w3.org/TR/2007/WD-exi-20070716/
I don't know enough about either format to say if it would be
possible, but an implementation that conformed to the ElementTree API
could be a big win.
I was thinking something much less restrictive than the two links.
Since it's not text, I'm not sure it event counts as structured
markup. More generic, something like hierarchical 'tag-content-child'
pairs.
Here's what the xml.etree.ElementTree API says:
Each element has a number of properties associated with it:
- a tag which is a string identifying what kind of data this element
represents (the element type, in other words).
- a number of attributes, stored in a Python dictionary.
- a text string.
- an optional tail string.
- a number of child elements, stored in a Python sequence
Since all of these would be buffer-based representations, the
attribute list would merely implement the mapping-object protocol, not
be in a true dictionary. The strings would be stored as offsets to
length-prefixed buffer segments.
Each node would look roughly like:
tag_offset, first_attr, text_offset, tail_offset, first_child,
prev_sibling, next_sibling, parent
Attributes would look like:
key_offset, value_offset, prev_attr, next_attr, node
These are all integers representing offsets elsewhere into the map.
A short observation:
'abc'a= e.XML( '<a><b>abc</b></a>' )
a.getchildren()[0].text
'<a><b>ab<</b></a>'a.getchildren()[0].text= 'ab<'
e.tostring(a)
<Element a at c2c3f0>e.XML(_)
'ab<'_.getchildren()[0].text
The current implementation supports round trips between special
characters '<' and markup '<', which I propose to support as well.
Of course, you'd have to garbage collect removed nodes by hand, on any
deletions.
Also, poss. change subject to: ElementTree + mmap cross.
.
- References:
- Re: xml + mmap cross
- From: alex23
- Re: xml + mmap cross
- Prev by Date: Re: Tkinter Radio button on the second window
- Next by Date: Re: pdb bug and questions
- Previous by thread: Re: xml + mmap cross
- Next by thread: why is self not passed to id()?
- Index(es):
Relevant Pages
|