Re: FAQ 5.4 How do I delete the last N lines from a file?



On 5/16/2010 12:00 AM, PerlFAQ Server wrote:
This is an excerpt from the latest version perlfaq5.pod, which
comes with the standard Perl distribution. These postings aim to
reduce the number of repeated questions as well as allow the community
to review and update the answers. The latest version of the complete
perlfaq is at http://faq.perl.org .

--------------------------------------------------------------------

5.4: How do I delete the last N lines from a file?

(contributed by brian d foy)

The easiest conceptual solution is to count the lines in the file then
start at the beginning and print the number of lines (minus the last N)
to a new file.

Most often, the real question is how you can delete the last N lines
without making more than one pass over the file, or how to do it with a
lot of copying. The easy concept is the hard reality when you might have
millions of lines in your file.

One trick is to use "File::ReadBackwards", which starts at the end of
the file. That module provides an object that wraps the real filehandle
to make it easy for you to move around the file. Once you get to the
spot you need, you can get the actual filehandle and work with it as
normal. In this case, you get the file position at the end of the last
line you want to keep and truncate the file to that point:

use File::ReadBackwards;

my $filename = 'test.txt';
my $Lines_to_truncate = 2;

my $bw = File::ReadBackwards->new( $filename )
or die "Could not read backwards in [$filename]: $!";

my $lines_from_end = 0;
until( $bw->eof or $lines_from_end == $Lines_to_truncate )
{
print "Got: ", $bw->readline;
$lines_from_end++;
}

truncate( $filename, $bw->tell );

The "File::ReadBackwards" module also has the advantage of setting the
input record separator to a regular expression.

You can also use the "Tie::File" module which lets you access the lines
through a tied array. You can use normal array operations to modify your
file, including setting the last index and using "splice".
Feeling bored I compared the code in the faq with
some bash code that would achieve the same results.
I also ran some generic perl that did basically the same
thing as the shell script(code at bottom).
The test file was named 'puke'. Contents are the integers 0 through
999999. 1 million rows total. The test is to excluded the last 10000
lines. perl 5.10.1 on cygwin. machine has 4gb ram. dual core Intel.
Anyway, in this not really scientific test the faq method using
Uri's File::ReadBackwards module is the winner. I suppose this is the
expected result but I thought the shell code would be more
competitive.

$ time perl faq.pl > top_n-10000

real 0m0.219s
user 0m0.093s
sys 0m0.061s

$ time cat puke | wc -l | xargs echo -10000 + | bc \
| xargs echo head puke -n | sh > top_n-10000

real 0m0.312s
user 0m0.090s
sys 0m0.121s

$ time perl temp.pl > top_n-10000

real 0m0.858s
user 0m0.701s
sys 0m0.062s

-----------------
temp.pl
-----------------
use strict;
use warnings;

my $num_lines_exclude=10000;

open(FH, '<', "puke") or die $!;
my $line_count=0;
while(<FH>){
$line_count++;
}
seek(FH, 0, 0);
my $lines_to_read=$line_count-$num_lines_exclude;
while($lines_to_read>0){
my $line=<FH>;
print $line;
$lines_to_read--;
}
.