Re: perl multithreading performance
- From: xhoster@xxxxxxxxx
- Date: 27 Aug 2008 22:17:29 GMT
dniq00@xxxxxxxxx wrote:
Hello, oh almighty perl gurus!
I'm trying to implement multithreaded processing for the humongous
amount of logs that I'm currently processing in 1 process on a 4-CPU
server.
Start 4 processes, telling each one to work on a different log file.
Either do this from the command line, or implement it with fork or system,
depending on how automatic it all has to be.
Anyway, currently all this is done in a single process, which parses
approx. 30000 lines per second.
If you just check for GET (and then ignore the result), how many lines per
second would it do?
The CPU usage for this process is
100%, so the bottleneck is in the parsing part.
I have changed the script to use threads + threads::shared +
Thread::Queue. I read data from logs like this:
Code
until( $no_more_data ) {
my @buffer;
foreach( (1..$buffer_size) ) {
if( my $line = <> ) {
push( @buffer, $line );
} else {
$no_more_data = 1;
$q_in->enqueue( \@buffer );
foreach( (1..$cpu_count) ) {
$q_in->enqueue( undef );
}
last;
}
}
$q_in->enqueue( \@buffer ) unless $no_more_data;
}
Then, I create $cpu_count threads, which does something like this:
What do you mean "then"? If you wait until all lines are enqueued before
you create the consumer threads, your entire log file will be in memory!
Code
sub parser {
my $counters = {};
while( my $buffer = $q_in->dequeue() ) {
foreach my $line ( @{ $buffer } ) {
# do its thing
}
}
return $counters;
}
When $counters is returned, what do you do with it? That could be
another synchronization bottleneck.
Everything works fine, HOWEVER! It's all so damn slow! It's only 10%
faster than single-process script, consumes about 2-3 times more
memory and about as much times more CPU.
That doesn't surprise me.
I've also tried abandoning the Thread:Queue and just use
threads::shared with lock/cond_wait/cond_signal combination, without
much success.
This also doesn't surprise me. Synchronizing shared access is hard and
often slow.
I've tried to play with $cpu_count and $buf_size, and found that after
$buf_size > 1000 doesn't make much difference, and $cpu_count > 2
actually makes things a lot worse.
Any ideas why in the world it's so slow? I did some research and
couldn't find a lot of info, other than the way I do it pretty much
the way it should be done, unless I'm missing something...
Hope anybody can enlighten me...
If you post fully runnable dummy code, and a simple program which
generates log-file data to put through it, I'd probably couldn't resist the
temptation to play around with it and find the bottlenecks.
Xho
--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
.
- References:
- perl multithreading performance
- From: dniq00
- perl multithreading performance
- Prev by Date: Re: perl multithreading performance
- Next by Date: Perl forgets variable every other pass in loop???
- Previous by thread: Re: perl multithreading performance
- Next by thread: Re: perl multithreading performance
- Index(es):
Relevant Pages
|