Re: Code that ought to run fast, but can't due to Python limitations.



2009/7/5 Hendrik van Rooyen <mail@xxxxxxxxxxxxxxx>:
I cannot see how you could avoid a python function call - even if he
bites the bullet and implements my laborious scheme, he would still
have to fetch the next character to test against, inside the current state.

So if it is the function calls that is slowing him down, I cannot
imagine a solution using less than one per character, in which
case he is screwed no matter what he does.

A simple solution may be to read the whole input HTML file in a
string. This potentially requires lots of memory but I suspect that
the use case by far most common for this parser is to build a DOM (or
DOM-like) tree of the whole document. This tree usually requires much
more memory that the HTML source itself.

So, if the code duplication is acceptable, I suggest keeping this
implementation for cases where the input is extremely big *AND* the
whole program will work on it in "streaming", not just the parser
itself.

Then write a simpler and faster parser for the more common case when
the data is not huge *OR* the user will keep the whole document in
memory anyway (e.g. on a tree).

Also: profile, profile a lot. HTML pages are very strange beasts and
the bottlenecks may be in innocent-looking places!

--
Lino Mastrodomenico
.



Relevant Pages

  • Re: text editor component (seeking design advice)
    ... that's 7 bytes per character at minimal). ... I wouldn't worry too much about memory consumption. ... 1.6MB with a 7 byte descriptor. ... You might even consider keeping a tree structure. ...
    (alt.comp.lang.borland-delphi)
  • Re: bison and valgrind
    ... You write code in the actions that builds up a tree out of nodes, ... Then the parser completes and passes ... Worth checking what happens if the parser detects a syntax error. ... it's just a guess based on my memory of Yacc from years gone by). ...
    (comp.programming)
  • Re: Part 1 (of 3): What are major aspects of evolutionary theory?
    ... With no data except the character matrix or DNA ... You need some *other* kind of data to root it. ... >> All species are equally well adapted to the modern environment. ... If you don't know where the tree is rooted, ...
    (talk.origins)
  • Re: both specifying now, Khalid and Hamza pined the recent tents in search of current notion
    ... equally squeeze Hakeem and Annie's protestant draper. ... rain insights. ... Gawd, I'll project the character. ... it will apparently guide the memory. ...
    (sci.crypt)
  • Re: Cohens paper on byte order
    ... memory space representing that variable. ... >> I meant by character by character.) ... > It is used for practical tasks by a lot of engineers working with ...
    (sci.crypt)

Loading