Re: Speedup of regexp required for Tcl-Scanner



On Apr 29, 1:00 pm, "Dr. Detlef Groth" <dgr...@xxxxxx> wrote:
Probably not. As I need this chunked behaviour and stepping through
different states.
The code on top should be fully functional.

Sorry but I won't be diving into this. I was trying to help you
separate the part spent in the real regexp engine from the part spent
outside. Without this analysis you cannot require a "speedup of
regexp" as the title says.

It seems also that it is not pathological slow but it is what can be
currently achieved with plain tcl. A boost I achieved by changing reg4
line from
set reg4 {^.|\n}
to:
set reg4 {^[^\n]+|\n}
to grab as much as possible
This finally gives me one order of magnitude. As I can see now:

3 seconds with 128 byte and 10 seconds with 40*128 bytes. So I should
try to grab to the end of line if possible.

Whatever. It's not clear whether you are requesting assistance for
optimization of your regexps or trying to draw attention to an
unexpected algorithmic worst case.

In any case, from a macroscopic look, I have the impression that
you're trying to reimplement line-oriented prefix parsing (considering
the frequency of ^ and \n in your regexp) within an awkward block-
based parsing loop. I'd suggest instead:

while {[gets stdin line]>=0} {
switch -regexp -- $line {
{^Someprefix} {...}
...
}
}

-Alex








Thanks,
Detlef

Can you isolate one single large string and one single regexp
exhibiting this pathological slowness ?

-Alex

.