Re: Naive threading performance questions



It seems pretty weird that Text::CSV_XS would be slower than a database
load. I'm not sure I'd give up on that avenue for optimization quite yet.
Can you post an example of some data and code, and the top several subs
from dprofpp? Maybe we can spot something that isn't going as well as it
could be. Of course, if parallelization turns out to be easy and
successful, then there is probably not point in delving too deeply. Is
your data in ASCII/bytes, or in a wide character set?

You're right, its me. I came up with the following example which is
very similar to one of my data sets and loads, and basically shows that
I can do a lot better.

The input file line is of the form:

IP/MASK [key=value][:key=value]

where there are a limited set of known keys, however each key isn't
necessarily present in the input line, however a (empty) CSV slot for
it must be present in the output line. The keys and values are all
ASCII in this particular load (which is not true of all my data
sources), except for a couple of odd characters here and there, which
can be safely deleted. A sample line would be:

0.0.0.0/0 keya=vala:keyb=valb:keyc=valc:keyd=vald:keybig=this is bigger
value:keyanother=this is another key value

And here is my code. I've factored out the line processing so that it
would show up in the dprofpp. Again, sorry if there are any hand-copy
errors ....

#!/usr/bin/perl

use strict; use warnings;
use IO::File; use Text::CSV_XS;

my @valid_columns = qw/ keya keyb keyc keyd keye keyf keyg keybig
keyanother /;
my %valid_columns = map {$_ => 1} @valid_columns;

my $output_csv = Text::CSV_XS->new({eol=>"\n", 'binary' => 1});
$output_csv->print(*STDOUT, process_line($_)) while (<>);

sub process_line {
my ($line) = @_;
my ($ip_range, $rest) = split /\s+/, $line, 2;
chomp($rest);

my %ip_details = (ip_range => $ip_range);

# split on ':', then split each element on '=' and stick in hash
map { my ($k, $v) = split /=/; $ip_details{$k} = $v } split(/:/,
$rest);

# fix up column with random bad bytes
$ip_details{keya} = s/[^\x20-\x7e]//g;

my @cols = map { $ip_details{$_} } @valid_columns;
return \@cols;
}

and the top of the dprofpp looks like:

%Time ExclSec CumulS #Calls sec/call Csec/c Name
81.7 10.38 10.386 214000 0.0000 0.0000 main::process_line
10.7 1.362 1.838 214000 0.0000 0.0000 Text::CSV_XS::print
3.75 0.476 0.476 214000 0.0000 0.0000 IO::Handle::print
... other stuff is <1s

Greatly appreciate all your help.

.



Relevant Pages

  • Re: Explorer keep launching at start up
    ... keys in two replys. ... "Load at next startup only"=dword:00000010 ... "Image API Enabled Filters"="BMP GIF JPEG PCX PNG TIFF FPX MIX" ... @="Microsoft Office Environment" ...
    (microsoft.public.windowsxp.general)
  • Re: regex - better way?
    ... // Load the file. ... // Print all keys and values. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: FC2 wireless woes
    ... > How, specifically, did you load it? ... even though the modules load and the wireless card is detected ... NETDEV WATCHDOG: wlan0: transmit timed out ... Set genstr or keys, not both. ...
    (alt.os.linux.redhat)
  • Re: Changing ComputerName
    ... If you can manually run regedit.exe you can 'load a hive' file to ... good for experimenting before automating. ... A Previous post from Andy Allredincluded: ... >> You could use a script to alter the ComputerName & Hostname keys such as ...
    (microsoft.public.windowsxp.embedded)