Re: Performance question
From: Paul Lalli (mritty_at_gmail.com)
Date: 03/29/05
- Next message: sjp: "YARQ - Yet another regex question"
- Previous message: PerlFAQ Server: "FAQ 3.3 Is there a Perl shell?"
- Next in thread: xhoster_at_gmail.com: "Re: Performance question"
- Maybe reply: xhoster_at_gmail.com: "Re: Performance question"
- Reply: xhoster_at_gmail.com: "Re: Performance question"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Tue, 29 Mar 2005 18:35:47 GMT
Dave Sill wrote:
> One of my users made the following observation. I'm only an
> occasional, lightweight Perl user, so I can't explain what he's
> seeing. Can anyone shed some light on it? H/W is a pretty large/fast
> Dell server running RHEL 3.
>
> ----
> I manufactured a 401x401 [ linearly =160801 element] array [@judy]
> each element having string values like
>
> 01000000001110000000000000000001
>
> I needed to make a comma delimited ascii file of this data.
>
> I decided a single IO write of a string would be the fastest, so i
> made a string
> $str="";
> foreach $i(0..$#judy-1)
> {
> $str=$str."$judy[$i],"
> }
> $str=$str."$judy[$#judy]"; open(OUT,">$output_file");print OUT
> $str;close(OUT);
> `gzip -f $output_file`;
>
> this took 16 minutes.
>
> i tried it the slow way,
>
> open(OUT,">$output_file");
> foreach $i(0..$#judy-1)
> {
> print OUT "$judy[$i],";
> }
> print OUT "$judy[$#judy]"; close(OUT);
> `gzip -f $output_file`;
>
> with 160K IOs, this took about 3 seconds.
>
> the gz files were different, but diff said uncompressed they were the
> same.
Your user has an odd definition of "faster" and "slower". I don't know
what would make the user think that storing the entire 160,801 element
array in memory TWICE would be faster than just printing what's needed
when it's needed.
In the first algorithm, the user is storing one large string, and each
time through the loop, appending to that string. Towards the end, this
means storing over (160,000 x 32) bytes in a single scalar, and asking
perl to append to the end of that string. Then finally you ask perl to
make one absurdly large I/O access.
In the second algorithm, you're simply printing 32 bytes repeatedly.
Neither of those ways are especially good perl code, of course. The
first would be better written:
my $str = join (',', @judy);
open my $out, '>', $output_file or die "Can't open output: $!";
print $out $str;
close $out;
The second would be better written
open my $out, '>', $output_file or die "Can't open output: $!";
{
local $, = ',';
print $out @judy;
}
close $out;
I would suggest your user use the Benchmark module to determine which of
these is actually faster.
Paul Lalli
- Next message: sjp: "YARQ - Yet another regex question"
- Previous message: PerlFAQ Server: "FAQ 3.3 Is there a Perl shell?"
- Next in thread: xhoster_at_gmail.com: "Re: Performance question"
- Maybe reply: xhoster_at_gmail.com: "Re: Performance question"
- Reply: xhoster_at_gmail.com: "Re: Performance question"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|