Re: optimize log parsing



"it_says_BALLS_on_your forehead" <simon.chao@xxxxxxx> wrote:
> Hey Xho, I tried this:
> ----
> #!/apps/webstats/bin/perl
>
> use File::Copy;
> use Parallel::ForkManager;
>
> my $pm = Parallel::ForkManager->new(5);
>
> $pm->run_on_start(
> sub { my ($pid,$ident)=@_;
> print "** $ident started, pid: $pid\n";
> }
> );
>
> my @data = 1 ... shift;
> for (@data) {
> my $pid = $pm->start and next;
> print "$pid: $_\n";
> $pm->finish;
> }
>
> $pm->wait_all_children;
> ------------
> and got this:
> #####
> [smro180 123] ~/simon/1-perl > tryFork.pl 10
> ** started, pid: 16208
> 0: 1
> ** started, pid: 16209
> 0: 2
> ** started, pid: 16210
....
>
> ...I read this:
> start [ $process_identifier ]
> This method does the fork. It returns the pid of the child process for
> the parent, and 0 for the child process. If the $processes parameter
> for the constructor is 0 then, assuming you're in the child process,
> $pm->start simply returns 0.
>
> An optional $process_identifier can be provided to this method... It is
> used by the "run_on_finish" callback (see CALLBACKS) for identifying
> the finished process.
>
> and this:
> run_on_start $code
> You can define a subroutine which is called when a child is started. It
> called after the successful startup of a child in the parent process.
>
> The parameters of the $code are the following:
>
> - pid of the process which has been started
> - identification of the process (if provided in the "start" method)
>
> ...but I don't understand why in my: print "$pid: $_\n";
> line, i'm getting 0 as the pid. I know the documentation said i should
> get 0 for the child process and the child pid for the parent, but
> aren't i calling start on the parent?

You are calling "start" *in* the parent, but is returning in both the
parent and child process. Inside, "start" does a fork, so when "start"
ends there are two processes. The parent process gets the child's pid,
which means the "and next" is activated. The child gets zero, so the "and
next" is not activated. This means everything between the start and the
finish statements are done in one of the children, not in the parent.

The example I posted was just copied and modified from perldoc, and for
some reason they do capture the pid. In practise I almost never capture
it:

$pm->start and next;

If the child needs it's own pid, it gets it from $$. Why do I need
the parent to know the child's pid? Usually I don't, because the module
itself takes care of all the waiting and stuff for me.

I rarely use anything except new, start, finish, and wait_all_children,
except to goof around with. Once your needs get more complicated than
those simple methods, I find that things get hairy real quick.

BTW, I'm curious about the bottleneck in your code. If your code is
CPU-bound, then parallelization to 20 processes won't help much unless you
have 20 CPUs. If it is disk-drive bound, then parallelization won't help
unless your files are on different disks (and probably on different
controllers.)

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
.



Relevant Pages

  • Re: Killing a process that takes too long
    ... You may instead use fork and exec; this lets you use the process-ID to ... kill 'INT', $pid; ... and it does not guarantee that the child ... So we need a way to kill several processes of the process group of the parent, ...
    (perl.beginners)
  • Non-random PIDs
    ... new process ID's, in the way that OpenBSD does. ... I'm the child and my pid is 21116. ... I'm the parent and my pid is 21115. ...
    (RedHat)
  • Re: Killing a process that takes too long
    ... and it does not guarantee that the child ... You can test it by placing $$ (process pid) in the output of these two ... So we need a way to kill several processes of the process group of the parent, ...
    (perl.beginners)
  • Re: Creatng 100% separate process from Parent
    ... the "child" process still maintains its ... So another process could start with the same pid as the "parent" - should ... > configuration tool) is typically just run from the start menu. ...
    (microsoft.public.win32.programmer.kernel)
  • Re: optimize log parsing
    ... >> Hey Xho, I tried this: ... >> for the constructor is 0 then, assuming you're in the child process, ... >> called after the successful startup of a child in the parent process. ... >> - pid of the process which has been started ...
    (comp.lang.perl.misc)