Re: XML SAX parser bug?




Fredrik Lundh schreef:

> mitsura@xxxxxxxxx wrote:
>
> > I think I ran into a bug in the XML SAX parser.
> >
> > part of my program consist of reading a rather large XML file (about
> > 10Mb) containing a few thousand elements.
> > I have the following problem. Sometimes that SAX parses misreads a
> > line.
> > Let me explain: the XML file contains a few thousand lines like this:
> > "
> > <TargetRef>WINOSSPI:Storage@@n91c90a.cmc.com</TargetRef>
> > "
> > where 'n91c90a.cmc.com' is the name of a system and thus changes per
> > system.
> > I a few cases, the SAX parser misreads the line. The parser sometimes
> > plits characters the line in:
> > "WINOSSPI:Storage@@n" and "91c90a.cmc.com".
> > I put a 'print characters' line in the 'characters' method of the
> > parser that is how I found out.
> > It only happens for a few of the thousand lines but you can imagine
> > that is very annoying.
> >
> > I checked for errors in the XML file but the file seems ok.
> >
> > Is this a bug or am I doing something wrong?
>
> it's not a bug; the parser is free to split up character runs (due to buffering,
> entities or character references, etc). it's up to you to merge character runs
> into strings.
>
> </F>
Thanks for the feedback,

but how do I detect that the parser has split up the characters? I gues
I need to detect it in order to reconstruct the complete string

.



Relevant Pages

  • Re: SAX PARSING DESIGN PATTERN
    ... I am parsing out an xml document using a sax parser. ... In the class that implements the parser element for a given tag I include a reference to the parent parser element object. ... The parsing loop will retrieve a handler for the current tag during startElement() and set its "parent" instance variable to the current AbstractHandler before pointing currentHandler at the new one. ... Or does your SAX parser actually have StartElementand EndElementmethods? ...
    (comp.lang.java.programmer)
  • Re: Maintaining a Great-than Character in an Attribute Value
    ... transform was an XML file, whereas it is not valid XML as it contains ... even to a good parser, so that's more strongly forbidden than ">". ... accept either of these entity references instead of the character. ...
    (comp.text.xml)
  • Re: XML SAX parser bug?
    ... > I think I ran into a bug in the XML SAX parser. ... the SAX parser misreads the line. ... > I put a 'print characters' line in the 'characters' method of the ...
    (comp.lang.python)
  • Re: python from Java
    ... >>events generated by your SAX parser. ... See the problem is an XML ... to find out that they are not supported on python-ce, ...
    (comp.lang.python)
  • Re: Element name length & performance implications
    ... >> that can impact the performance of an XML parser? ... linguistic markup in mixed content such that the character ... with element type names machine-generated from concatenated ... sure what effect the names had on the parser. ...
    (comp.text.xml)