Re: Naive threading performance questions
- From: "Worky Workerson" <worky.workerson@xxxxxxxxx>
- Date: 27 Oct 2006 09:19:21 -0700
It seems pretty weird that Text::CSV_XS would be slower than a database
load. I'm not sure I'd give up on that avenue for optimization quite yet.
Can you post an example of some data and code, and the top several subs
from dprofpp? Maybe we can spot something that isn't going as well as it
could be. Of course, if parallelization turns out to be easy and
successful, then there is probably not point in delving too deeply. Is
your data in ASCII/bytes, or in a wide character set?
You're right, its me. I came up with the following example which is
very similar to one of my data sets and loads, and basically shows that
I can do a lot better.
The input file line is of the form:
IP/MASK [key=value][:key=value]
where there are a limited set of known keys, however each key isn't
necessarily present in the input line, however a (empty) CSV slot for
it must be present in the output line. The keys and values are all
ASCII in this particular load (which is not true of all my data
sources), except for a couple of odd characters here and there, which
can be safely deleted. A sample line would be:
0.0.0.0/0 keya=vala:keyb=valb:keyc=valc:keyd=vald:keybig=this is bigger
value:keyanother=this is another key value
And here is my code. I've factored out the line processing so that it
would show up in the dprofpp. Again, sorry if there are any hand-copy
errors ....
#!/usr/bin/perl
use strict; use warnings;
use IO::File; use Text::CSV_XS;
my @valid_columns = qw/ keya keyb keyc keyd keye keyf keyg keybig
keyanother /;
my %valid_columns = map {$_ => 1} @valid_columns;
my $output_csv = Text::CSV_XS->new({eol=>"\n", 'binary' => 1});
$output_csv->print(*STDOUT, process_line($_)) while (<>);
sub process_line {
my ($line) = @_;
my ($ip_range, $rest) = split /\s+/, $line, 2;
chomp($rest);
my %ip_details = (ip_range => $ip_range);
# split on ':', then split each element on '=' and stick in hash
map { my ($k, $v) = split /=/; $ip_details{$k} = $v } split(/:/,
$rest);
# fix up column with random bad bytes
$ip_details{keya} = s/[^\x20-\x7e]//g;
my @cols = map { $ip_details{$_} } @valid_columns;
return \@cols;
}
and the top of the dprofpp looks like:
%Time ExclSec CumulS #Calls sec/call Csec/c Name
81.7 10.38 10.386 214000 0.0000 0.0000 main::process_line
10.7 1.362 1.838 214000 0.0000 0.0000 Text::CSV_XS::print
3.75 0.476 0.476 214000 0.0000 0.0000 IO::Handle::print
... other stuff is <1s
Greatly appreciate all your help.
.
- Follow-Ups:
- Re: Naive threading performance questions
- From: xhoster
- Re: Naive threading performance questions
- References:
- Naive threading performance questions
- From: Worky Workerson
- Re: Naive threading performance questions
- From: jdhedden
- Re: Naive threading performance questions
- From: Worky Workerson
- Re: Naive threading performance questions
- From: J. Gleixner
- Re: Naive threading performance questions
- From: Worky Workerson
- Re: Naive threading performance questions
- From: xhoster
- Naive threading performance questions
- Prev by Date: Re: Store multi-dimensions array for use in latter form?
- Next by Date: Re: activestate perl via shell() on XP, module location problem
- Previous by thread: Re: Naive threading performance questions
- Next by thread: Re: Naive threading performance questions
- Index(es):
Relevant Pages
|