Re: Compress::Zlib vs. external gzip call
From: odigity (ofer_at_netapt.com)
Date: 28 Oct 2004 14:13:37 -0700
Stuart Moore <firstname.lastname@example.org> wrote in message news:<email@example.com>...
> odigity wrote:
> > I'm writing a script that needs to run in as fast a time as possible.
> > Every minute counts. The script crawls a tree of gzipped files
> > totalling about 30gb. Originally I was calling open() with "gzip
> > $file |", but I hate making external calls - it requires a fork, and
> > you have very limited communication with the process for catching
> > errors and such. I always like using perl functions and modules when
> > possible over external calls. However, I wanted to make sure I
> > wouldn't take a performance hit before switching to Compress::Zlib.
> Just thinking out loud here:
> - Would the time measured by "Benchmark" include the time to start gzip?
> Does it measure total time, or just time when the perl process is using
> the CPU? Do the times mentioned match what you'd get with a stopwatch?
I'm not sure if Benchmark is capable of supervising child processes
off the top of my head. I probably need to take that into account and
just use straight clocktime and enough iterations to smooth out system
> - Might it be worth looking at some of the smaller files as well,
> possibly the time taken to open gzip is less significant on the large
> ones than the small ones?
Perhaps... most of the files are small, but I think most of the time
is spent on the few big files. And I also simply wanted to determine
which was faster at actual decompression. Still, a valid point.
> - Is there any way you can keep the gzip process open and only call it
> once to decompress multiple files? One fork is better than many
Hmm... I suppose I could use open2 to connect to both STDIN and STDOUT
and keep feeding it, but then I'd have to read the files myself into
the perl environment and print it out to the gzip process, which I'd
bet money will be slower. And there are too many files to build a
list and shove them onto a single command line. Man gzip reveals no
option for fetching a list of files from the command line.