Re: Code that ought to run fast, but can't due to Python limitations.
- From: Lino Mastrodomenico <l.mastrodomenico@xxxxxxxxx>
- Date: Sun, 5 Jul 2009 16:25:31 +0200
2009/7/5 Hendrik van Rooyen <mail@xxxxxxxxxxxxxxx>:
I cannot see how you could avoid a python function call - even if he
bites the bullet and implements my laborious scheme, he would still
have to fetch the next character to test against, inside the current state.
So if it is the function calls that is slowing him down, I cannot
imagine a solution using less than one per character, in which
case he is screwed no matter what he does.
A simple solution may be to read the whole input HTML file in a
string. This potentially requires lots of memory but I suspect that
the use case by far most common for this parser is to build a DOM (or
DOM-like) tree of the whole document. This tree usually requires much
more memory that the HTML source itself.
So, if the code duplication is acceptable, I suggest keeping this
implementation for cases where the input is extremely big *AND* the
whole program will work on it in "streaming", not just the parser
itself.
Then write a simpler and faster parser for the more common case when
the data is not huge *OR* the user will keep the whole document in
memory anyway (e.g. on a tree).
Also: profile, profile a lot. HTML pages are very strange beasts and
the bottlenecks may be in innocent-looking places!
--
Lino Mastrodomenico
.
- References:
- Code that ought to run fast, but can't due to Python limitations.
- From: John Nagle
- Re: Code that ought to run fast, but can't due to Python limitations.
- From: Paul Rubin
- Re: Code that ought to run fast, but can't due to Python limitations.
- From: John Nagle
- Re: Code that ought to run fast, but can't due to Python limitations.
- From: Paul Rubin
- Re: Code that ought to run fast, but can't due to Python limitations.
- From: John Nagle
- Re: Code that ought to run fast, but can't due to Python limitations.
- From: Steven D'Aprano
- Re: Code that ought to run fast, but can't due to Python limitations.
- From: Paul Rubin
- Re: Code that ought to run fast, but can't due to Python limitations.
- From: Steven D'Aprano
- Re: Code that ought to run fast, but can't due to Python limitations.
- From: Paul Rubin
- Code that ought to run fast, but can't due to Python limitations.
- Prev by Date: Re: How Python Implements "long integer"?
- Next by Date: Re: How Python Implements "long integer"?
- Previous by thread: Re: Code that ought to run fast, but can't due to Python limitations.
- Next by thread: Re: Code that ought to run fast, but can't due to Python limitations.
- Index(es):
Relevant Pages
|
Loading