perl multithreading performance
- From: dniq00@xxxxxxxxx
- Date: Wed, 27 Aug 2008 12:59:36 -0700 (PDT)
Hello, oh almighty perl gurus!
I'm trying to implement multithreaded processing for the humongous
amount of logs that I'm currently processing in 1 process on a 4-CPU
server.
What the script does is for each line it checks if the line contains
GET request, and if it does - goes through a list of pre-compiled
regular expressions, trying to find a matching one. Once the match is
found - it uses another regexp, associated with the found match, which
is a bit more complex, to extract data from the line. I have split it
in two separate matches, because about 30% of all lines will match,
and I don't want to run that complex regexp to extract data for all
the lines I know won't match. The goal is to count how many lines
matched for every specific regexp, and the end result is built as a
hash, having data, extracted from the line with second regexp, used as
hash keys, and the value is the number of matches.
Anyway, currently all this is done in a single process, which parses
approx. 30000 lines per second. The CPU usage for this process is
100%, so the bottleneck is in the parsing part.
I have changed the script to use threads + threads::shared +
Thread::Queue. I read data from logs like this:
Code
until( $no_more_data ) {
my @buffer;
foreach( (1..$buffer_size) ) {
if( my $line = <> ) {
push( @buffer, $line );
} else {
$no_more_data = 1;
$q_in->enqueue( \@buffer );
foreach( (1..$cpu_count) ) {
$q_in->enqueue( undef );
}
last;
}
}
$q_in->enqueue( \@buffer ) unless $no_more_data;
}
Then, I create $cpu_count threads, which does something like this:
Code
sub parser {
my $counters = {};
while( my $buffer = $q_in->dequeue() ) {
foreach my $line ( @{ $buffer } ) {
# do its thing
}
}
return $counters;
}
Everything works fine, HOWEVER! It's all so damn slow! It's only 10%
faster than single-process script, consumes about 2-3 times more
memory and about as much times more CPU.
I've also tried abandoning the Thread:Queue and just use
threads::shared with lock/cond_wait/cond_signal combination, without
much success.
I've tried to play with $cpu_count and $buf_size, and found that after
$buf_size > 1000 doesn't make much difference, and $cpu_count > 2
actually makes things a lot worse.
Any ideas why in the world it's so slow? I did some research and
couldn't find a lot of info, other than the way I do it pretty much
the way it should be done, unless I'm missing something...
Hope anybody can enlighten me...
THANKS!
.
- Follow-Ups:
- Re: perl multithreading performance
- From: J. Gleixner
- Re: perl multithreading performance
- From: xhoster
- Re: perl multithreading performance
- From: Ted Zlatanov
- Re: perl multithreading performance
- From: Leon Timmermans
- Re: perl multithreading performance
- Prev by Date: FAQ 8.13 How do I trap control characters/signals?
- Next by Date: Truncate an array when you have a ref to it?
- Previous by thread: FAQ 8.13 How do I trap control characters/signals?
- Next by thread: Re: perl multithreading performance
- Index(es):
Relevant Pages
|