Re: Handling Large files (a few Gb) in Perl



On Jul 16, 7:40 am, "sydc...@xxxxxxxxx" <sydc...@xxxxxxxxx> wrote:
I am a beginner (or worse) at Perl.

I have a need to find the longest line (record) in a file. The below
code works neatly for small files.
But when I need to read huge files (in the order of Gb), it is very
slow.

Could someone help me in finding what way I could make Perl work the
best way for processing huge files such as these?

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
my $prev=-1;
my $curr=0;

my ($sec,$min,$hour,$com) = localtime(time);
print "Start time - $hour:$min:$sec \n";

open(F1, "c:\\perl\\syd\\del.txt");

while (<F1>)
{
$curr = index($_, "\x0A");

Well here's one improvement you could make. Don't force Perl to
search through each string looking for a specific character. Just ask
it what the lenght of the string is. In my tests, that's about 10%
faster:

#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw/:all/;

sub use_index {
open my $fh, '<', 'ipsum.txt' or die $!;
my $prev = 0;
while (<$fh>) {
my $cur = index($_, "\x0A");
if ($cur > $prev) {
$prev = $cur;
}
}
}

sub use_length {
open my $fh, '<', 'ipsum.txt' or die $!;
my $prev = 0;
while (<$fh>) {
my $cur = length;
if ($cur > $prev) {
$prev = $cur;
}
}
}

cmpthese(timethese(100_000, { length => \&use_length, index =>
\&use_index }));
__END__

Benchmark: timing 100000 iterations of index, length...
index: 26 wallclock secs (19.81 usr + 6.27 sys = 26.08 CPU) @
3834.36/s (n=100000)
length: 24 wallclock secs (17.10 usr + 6.47 sys = 23.57 CPU) @
4242.68/s (n=100000)
Rate index length
index 3834/s -- -10%
length 4243/s 11% --



Paul Lalli

.



Relevant Pages


Loading