Re: optimize log parsing
- From: xhoster@xxxxxxxxx
- Date: 05 Oct 2005 19:24:09 GMT
"it_says_BALLS_on_your forehead" <simon.chao@xxxxxxx> wrote:
> Hey Xho, I tried this:
> ----
> #!/apps/webstats/bin/perl
>
> use File::Copy;
> use Parallel::ForkManager;
>
> my $pm = Parallel::ForkManager->new(5);
>
> $pm->run_on_start(
> sub { my ($pid,$ident)=@_;
> print "** $ident started, pid: $pid\n";
> }
> );
>
> my @data = 1 ... shift;
> for (@data) {
> my $pid = $pm->start and next;
> print "$pid: $_\n";
> $pm->finish;
> }
>
> $pm->wait_all_children;
> ------------
> and got this:
> #####
> [smro180 123] ~/simon/1-perl > tryFork.pl 10
> ** started, pid: 16208
> 0: 1
> ** started, pid: 16209
> 0: 2
> ** started, pid: 16210
....
>
> ...I read this:
> start [ $process_identifier ]
> This method does the fork. It returns the pid of the child process for
> the parent, and 0 for the child process. If the $processes parameter
> for the constructor is 0 then, assuming you're in the child process,
> $pm->start simply returns 0.
>
> An optional $process_identifier can be provided to this method... It is
> used by the "run_on_finish" callback (see CALLBACKS) for identifying
> the finished process.
>
> and this:
> run_on_start $code
> You can define a subroutine which is called when a child is started. It
> called after the successful startup of a child in the parent process.
>
> The parameters of the $code are the following:
>
> - pid of the process which has been started
> - identification of the process (if provided in the "start" method)
>
> ...but I don't understand why in my: print "$pid: $_\n";
> line, i'm getting 0 as the pid. I know the documentation said i should
> get 0 for the child process and the child pid for the parent, but
> aren't i calling start on the parent?
You are calling "start" *in* the parent, but is returning in both the
parent and child process. Inside, "start" does a fork, so when "start"
ends there are two processes. The parent process gets the child's pid,
which means the "and next" is activated. The child gets zero, so the "and
next" is not activated. This means everything between the start and the
finish statements are done in one of the children, not in the parent.
The example I posted was just copied and modified from perldoc, and for
some reason they do capture the pid. In practise I almost never capture
it:
$pm->start and next;
If the child needs it's own pid, it gets it from $$. Why do I need
the parent to know the child's pid? Usually I don't, because the module
itself takes care of all the waiting and stuff for me.
I rarely use anything except new, start, finish, and wait_all_children,
except to goof around with. Once your needs get more complicated than
those simple methods, I find that things get hairy real quick.
BTW, I'm curious about the bottleneck in your code. If your code is
CPU-bound, then parallelization to 20 processes won't help much unless you
have 20 CPUs. If it is disk-drive bound, then parallelization won't help
unless your files are on different disks (and probably on different
controllers.)
Xho
--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
.
- Follow-Ups:
- Re: optimize log parsing
- From: it_says_BALLS_on_your forehead
- Re: optimize log parsing
- References:
- optimize log parsing
- From: it_says_BALLS_on_your forehead
- Re: optimize log parsing
- From: it_says_BALLS_on_your forehead
- Re: optimize log parsing
- From: Tassilo v. Parseval
- Re: optimize log parsing
- From: it_says_BALLS_on_your forehead
- Re: optimize log parsing
- From: xhoster
- Re: optimize log parsing
- From: it_says_BALLS_on_your forehead
- optimize log parsing
- Prev by Date: Re: optimize log parsing
- Next by Date: Re: optimize log parsing
- Previous by thread: Re: optimize log parsing
- Next by thread: Re: optimize log parsing
- Index(es):
Relevant Pages
|