Fast XML filtering

From: Martin Kofoed (inzide_at_hot.mail.com)
Date: 11/21/03


Date: Fri, 21 Nov 2003 11:16:53 +0100


Hi,

I'm writing a DLL that should be able to sweep through XML data at sizes up
to 10 MB per call.

Basically the user will send the XML as a string and pass another string
containing a start-tag that indicates which parts to filter out from the
XML (starttag and corresponding end-tag AND all the elements and data
between them).

Of course, "performance" is the key word here. I started out sweeping the
string using standard string handling functions, but I'm not impressed with
performance.

Which approach would be the best seen from a performance point of view?

Current solution is something to this effect:

while (pos(starttag,xml) > 0) do
begin
  // move everything before tag:
  result := result + copy(xml,1,position - 1);
  // delete everything before starttag:
  xml := copy(xml, position, length(xml));
  // find corresponding endtag:
  position := pos(endtag,xml);
  // delete endtag + all data before it:
  xml := copy(xml,(position + length(endtag)),length(xml));
  // find next starttag (if any):
  position := pos(starttag,xml);
end;

--
Martin Kofoed