Re: Partly erratic wrong behaviour, Python 3, lxml



Stefan Behnel writes:
Jussi Piitulainen, 04.03.2010 22:40:
Stefan Behnel writes:
Jussi Piitulainen, 04.03.2010 11:46:
I am observing weird semi-erratic behaviour that involves Python 3
and lxml, is extremely sensitive to changes in the input data, and
only occurs when I name a partial result. I would like some help
with this, please. (Python 3.1.1; GNU/Linux; how do I find lxml
version?)

Here's how to find the version:

http://codespeak.net/lxml/FAQ.html#i-think-i-have-found-a-bug-in-lxml-what-should-i-do

Ok, thank you. Here's the results:

print(et.LXML_VERSION, et.LIBXML_VERSION,
... et.LIBXML_COMPILED_VERSION, et.LIBXSLT_VERSION,
... et.LIBXSLT_COMPILED_VERSION)
(2, 2, 4, 0) (2, 6, 26) (2, 6, 26) (1, 1, 17) (1, 1, 17)

I can't reproduce this with the latest lxml trunk (and Py3.2 trunk)
and libxml2 2.7.6, even after running your test script for quite a
while. I'd try to upgrade the libxml2 version.

Thank you much. I suppose that is good news. It's a big server with
many users - I will ask the administrators to consider an upgrade when
I get around to it.

Turns out that lxml documentation warns not to use libxml2 version
2.6.27 if I want to use xpath, and that is just a bit newer than we
have. On that cue, I seem to have found a workaround: I replaced the
xpath expression with findall(titlef) where

titlef = ( '//{http://www.openarchives.org/OAI/2.0/}record'
'//{http://purl.org/dc/elements/1.1}title' )

In the previously broken naming() function I now have:

result = etree.parse(BytesIO(body))
n = len(result.findall(titlef))

And in the previously working nesting() function:

n = len(etree.parse(BytesIO(body)).findall(titlef))

With these changes, the test script gives consistently the result that
I expect, and the more complicated real test script where I first met
the problem also appears to work without a hitch. So, this works.

The other, broken behaviour is totally scary, though.
.



Relevant Pages

  • Re: xpathEval fails for large files
    ... CPU. ... but that won't help if the problem is with libxml2 itself, though. ... there's also cElementTree (bundled with Python 2.5), but that has only limited xpath support in the current version. ... both lxml and other implementations of the ET API supports incremental tree parsing: ...
    (comp.lang.python)
  • Re: xpathEval fails for large files
    ... this will stuck in following line and will result in high usage of ... CPU. ... the raw libxml2 API is pretty hopeless; ... cElementTree and lxml to see what works better for you. ...
    (comp.lang.python)
  • lxml.etree error: xmlSchematronSetValidStructuredErrors
    ... I have installed libxml2 and libxslt and then tried to install lxml ...
    (comp.lang.python)
  • Re: Partly erratic wrong behaviour, Python 3, lxml
    ... and lxml, is extremely sensitive to changes in the input data, and ... (Python 3.1.1; GNU/Linux; how do I find lxml ... I can't reproduce this with the latest lxml trunk and libxml2 2.7.6, even after running your test script for quite a while. ...
    (comp.lang.python)