Re: question about forked processes writing to the same file
- From: xhoster@xxxxxxxxx
- Date: 24 Oct 2005 18:23:04 GMT
"it_says_BALLS_on_your forehead" <simon.chao@xxxxxxx> wrote:
>
> hey Xho, i tried to trim down the code as much as possible while
> maintaining the exact same functionality of the code. in this
> simplified version, there are 2 scripts: betaProcess.pl and
> betaParse.pl
I think you could have trimmed it down a lot more in most places, and
a little less in others. :)
For example, all the stuff in betaProcess which are in paths other than
those invoking betaParse are almost certainly irrelevant.
Also, I think you need to rear back and think of the big picture some
more. You are forking of 16 processes, and you are creating 10 log files,
and you want all 16 of those processes to write to all 10 of those log
files, so you have 160 filehandles all fighting with each other. Is that
really necessary? Do the 10 log files serve 10 different purposes
(unlikely, it seems) or they are just to keep the size of any given log
file down, or what?
More on the big picutre. A couple weeks ago when you started asking here
about parsing (or was it FTPing?) lots of files, I thought
Parallel::ForkManager was the right solution. But you keep incrementally
adding new wrinkles, and I fear that by the time you are done adding them
perhaps the right solution will no longer be Parallel::ForkManager but
rather threads or event-loops or some RDMS-interfaced program or even some
other language entirely. Parallelization in inherently challenging, and it
probably needs some deep big-picture thinking, not incremental tinkering.
> ##
> # betaParse.pl
> ##
> use strict;
> use Cwd;
> use Date::Manip;
> use File::Basename;
> use File::Copy;
>
> my $rawCounts = 0;
>
> my $numOutputFiles = 10;
>
> open(my $fh_in, "gzip -dc $inputFile|") || dieWithMail("Can't open
> $inputFile $!\n");
There is no $inputFile! Like I said, too much trimming in places.
>
> $status = &reformatLogs($fh_in);
>
> close $fh_in;
>
> if ($status == 1) {
>
> system("touch $inputFile.DONE");
> }
>
> #--- subs ---
>
> sub reformatLogs {
> my ($fh_in) = @_;
>
> while( <$fh_in> ) {
> $rawCounts++;
> chomp;
>
> # process $_
>
> # evenly distribute data to output files
> my $section = $rawCounts % $numOutputFiles;
I don't understand what you want to accomplish with this. Do your lines
need to be parceled out to output files in this inherently inefficient way?
That seems unlikely, as log files generally have one record per line and
the relative position of lines to each other is meaningless. I doubt you
have a good reason for doing this, but for the rest of the post I'll assume
you do.
In my experience, having all the children write to a common file for
monitoring, debugging, errors, etc, is fine. But for ordinary output, I've
almost never found it beneficial to have multiple children multiplex into
shared unstructured output files. It is usually far easier to produce 10,
000 output files and then combine them at a later stage if that is
necessary.
> open( my $fh, ">>log$section" )
> || die "can't open log$section: $!\n";
## don't reopen the handle each time.
## rather, keep a hash of open handles
unless ( $section_handles{$section}) {
open (my $fh, ">>", "log$section)
or die "can't open log$section: $!\n";
$section_handles{$section}=$fh;
};
my $fh = $section_handles{$section};
> flock( $fh, 2 );
flock( $fh, LOCK_EX ) or die $!;
> print $fh "$_\n";
> close( $fh );
## and now you don't close the handle, but you still need it
unlocked. flock( $fh, LOCK_UN) or die $!;
> }
> return 1;
> }
>
> ....is there a problem because of the mix of forked processes and
> system calls? perhaps i should change the system call to a function
> call (after making the necessary code changes)
While I don't think this causes this particular problem, I would make that
change anyway.
> previously, the open filehandles were at the top of betaParse.pl and
> there was no locking. this appeared to cause the record splicing,
> although the processing was about twice as fast. can you shed some
> light on this phenommenon?
I would guess that reopening a file for every line is going to be slow.
Locking the file for every line is probably also going to be kind of
slow, but probably not nearly slow as re-opening it is. If it turns out to
be a bottleneck, you could batch up, say, 100 lines and write them in one
chunk, with only one lock around it. If you are going to do chunking, you
should probably whip up a module for it rather than putting it directly
into your code.
> you said before that the string length at
> which writing would start going crazy was around maybe 4096. the
> records in these weblogs are perhaps a maximum of 10 lines (this is a
> conservative estimate) on a 1024 x 768 res screen maximized.
I don't know how long 10 lines on a 1024x768 screen are. But I do know
that Perl has a length function :)
perl -lne 'print length' file
Xho
--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
.
- Follow-Ups:
- Re: question about forked processes writing to the same file
- From: it_says_BALLS_on_your forehead
- Re: question about forked processes writing to the same file
- References:
- question about forked processes writing to the same file
- From: it_says_BALLS_on_your forehead
- Re: question about forked processes writing to the same file
- From: Gunnar Hjalmarsson
- Re: question about forked processes writing to the same file
- From: it_says_BALLS_on_your forehead
- Re: question about forked processes writing to the same file
- From: Gunnar Hjalmarsson
- Re: question about forked processes writing to the same file
- From: A. Sinan Unur
- Re: question about forked processes writing to the same file
- From: it_says_BALLS_on_your forehead
- Re: question about forked processes writing to the same file
- From: Gunnar Hjalmarsson
- Re: question about forked processes writing to the same file
- From: it_says_BALLS_on_your forehead
- Re: question about forked processes writing to the same file
- From: Gunnar Hjalmarsson
- Re: question about forked processes writing to the same file
- From: it_says_BALLS_on_your forehead
- Re: question about forked processes writing to the same file
- From: it_says_BALLS_on_your forehead
- question about forked processes writing to the same file
- Prev by Date: Re: pipe used in perl - as in unix sense
- Next by Date: Re: some perl questions
- Previous by thread: Re: question about forked processes writing to the same file
- Next by thread: Re: question about forked processes writing to the same file
- Index(es):