[ANNOUNCEMENT] VTD-XML released under GPL

From: Jimmy zhang (jzhang_at_ximpleware.com)
Date: 06/30/04


Date: 30 Jun 2004 07:23:42 GMT


  I am pleased to announce that version 0.5 of VTD-XML -- a new,
non-extractive, Java-base XML processing API licensed under GPL
-- is now freely available on sourceforge.net. For source code,
documentation, detailed description of API and code examples,
please visit

  http://vtd-xml.sf.net

  Capable of random-access, VTD-XML attempts to be both memory
efficient and high performance. The starting point of this project is
the observation that, for XML documents that don't declare entities
in DTD, tokenization can indeed be done by only recording the starting
offset and length of a token. A discussion on this subject appeared
in a recently article on xml.com
(http://www.xml.com/pub/a/2004/05/19/parsing.html).

  The core technology of VTD-XML is a binary format specification
called Virtual Token Descriptor (VTD). A VTD record is a 64-bit integer
that encodes the starting offset, length, type and nesting depth of a
token in an XML document. Because VTD records don't contain actually
token content, they work alongside of the original XML document, which
is maintained intact in memory by the processing model.

  VTD's memory-conserving features can be summarized as follows:

  * Avoid Per-object overhead -- In many VM-based object-oriented
    programming languages, per-object allocation incurs a small amount
    of memory overhead. A VTD record is immune to the overhead because
    it is not an object.
  * Bulk-allocation of storage -- Fixed in length, VTD records can be
    stored in large memory blocks, which are more efficient to allocate
    and GC. By allocating a large array for 4096 VTD records, one incurs
    the per-array overhead (16 bytes in JDK 1.4) only once across 4096
    records, thus reducing per-record overhead to very little.

  Our benchmark indicates that VTD-XML processes XML at the performance
level similar to (and often better than) SAX with NULL content handler.
The memory usage is typically between 1.3x ~ 1.6x of the size of the
document, with "1" being the document itself.

  Other features included in this release are:

  * Incremental update -- VTD-XML allows one to modify content of XML
    without touching irrelevant parts of the document.
  * Content extraction -- VTD-XML also allows one to pull an element
    out of XML in its serialized format. This can be an important
    feature for partial signing/encryption of SOAP payload for
    WS-security.

  In the upcoming releases, we plan to add the persistence support so
that one can save/load VTD to/from the disk along with the XML documents
to avoid repetitive parsing in read-only situations. XPATH support is
also on the development roadmap. However, we would like to collect as
many suggestions and bug reports before taking the next step.

  Your input and suggestions are very important to make VTD-XML a truly
useful XML processor.

Thanks,

Jimmy Zhang



Relevant Pages

  • [ANNOUNCEMENT]:VTD-XML released under GPL
    ... Java-base XML processing API licensed under GPL ... The core technology of VTD-XML is a binary format specification ... that one can save/load VTD to/from the disk along with the XML documents ...
    (comp.lang.java.databases)
  • [ANNOUNCEMENT}: VTD-XML released under GPL
    ... Java-base XML processing API licensed under GPL ... The core technology of VTD-XML is a binary format specification ... that one can save/load VTD to/from the disk along with the XML documents ...
    (comp.lang.java.programmer)
  • Re: tuning question, relation between indexes and "for xml path", and
    ... procedures (selects with "for xml path") started consuming CPU ... sometimes causing server to have CPU peaks of 100% for 10 minutes or so! ... XML processing, no experience at all with XML)? ... stored procedures are selects. ...
    (microsoft.public.sqlserver.xml)
  • Re: Display XML Output from SQL Server
    ... I think it can be done through the XmlDocument ... We can simply create such a processing instruction and insert ... I would suggest you have a look at the .NET's XML processing ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: Saving program state in executable
    ... UML or OWL data using XML's DOM model, it takes quite a while to load ... The overhead is mostly the textprocessing and parsing of the XML models. ... 'const' record, or set of records, at compile time, and that might be ... knows it is one block of memory (assuming it doesn't contain e.g. ...
    (comp.lang.pascal.misc)