Re: optimize log parsing
- From: "it_says_BALLS_on_your forehead" <simon.chao@xxxxxxx>
- Date: 5 Oct 2005 13:06:02 -0700
xhos...@xxxxxxxxx wrote:
> "it_says_BALLS_on_your forehead" <simon.chao@xxxxxxx> wrote:
> > Hey Xho, I tried this:
> > ----
> > #!/apps/webstats/bin/perl
> >
> > use File::Copy;
> > use Parallel::ForkManager;
> >
> > my $pm = Parallel::ForkManager->new(5);
> >
> > $pm->run_on_start(
> > sub { my ($pid,$ident)=@_;
> > print "** $ident started, pid: $pid\n";
> > }
> > );
> >
> > my @data = 1 ... shift;
> > for (@data) {
> > my $pid = $pm->start and next;
> > print "$pid: $_\n";
> > $pm->finish;
> > }
> >
> > $pm->wait_all_children;
> > ------------
> > and got this:
> > #####
> > [smro180 123] ~/simon/1-perl > tryFork.pl 10
> > ** started, pid: 16208
> > 0: 1
> > ** started, pid: 16209
> > 0: 2
> > ** started, pid: 16210
> ...
> >
> > ...I read this:
> > start [ $process_identifier ]
> > This method does the fork. It returns the pid of the child process for
> > the parent, and 0 for the child process. If the $processes parameter
> > for the constructor is 0 then, assuming you're in the child process,
> > $pm->start simply returns 0.
> >
> > An optional $process_identifier can be provided to this method... It is
> > used by the "run_on_finish" callback (see CALLBACKS) for identifying
> > the finished process.
> >
> > and this:
> > run_on_start $code
> > You can define a subroutine which is called when a child is started. It
> > called after the successful startup of a child in the parent process.
> >
> > The parameters of the $code are the following:
> >
> > - pid of the process which has been started
> > - identification of the process (if provided in the "start" method)
> >
> > ...but I don't understand why in my: print "$pid: $_\n";
> > line, i'm getting 0 as the pid. I know the documentation said i should
> > get 0 for the child process and the child pid for the parent, but
> > aren't i calling start on the parent?
>
> You are calling "start" *in* the parent, but is returning in both the
> parent and child process. Inside, "start" does a fork, so when "start"
> ends there are two processes. The parent process gets the child's pid,
> which means the "and next" is activated. The child gets zero, so the "and
> next" is not activated. This means everything between the start and the
> finish statements are done in one of the children, not in the parent.
>
> The example I posted was just copied and modified from perldoc, and for
> some reason they do capture the pid. In practise I almost never capture
> it:
>
> $pm->start and next;
>
> If the child needs it's own pid, it gets it from $$. Why do I need
> the parent to know the child's pid? Usually I don't, because the module
> itself takes care of all the waiting and stuff for me.
>
> I rarely use anything except new, start, finish, and wait_all_children,
> except to goof around with. Once your needs get more complicated than
> those simple methods, I find that things get hairy real quick.
>
> BTW, I'm curious about the bottleneck in your code. If your code is
> CPU-bound, then parallelization to 20 processes won't help much unless you
> have 20 CPUs. If it is disk-drive bound, then parallelization won't help
> unless your files are on different disks (and probably on different
> controllers.)
>
> Xho
>
ahh, that makes sense, thanks!
to answer your question, i'm working on a box with 16 CPUs. the number
20 is from code that i inherited from a predecessor. there used to be
10 processes, and he changed it to 20, and it went faster, so 20 it
stayed. should i change it to 16?
also, what's the difference between using Parallel::ForkManager to do
20 tasks, and looping through system('script.pl &') 20 times? i mean, i
see an advantage in that with ForkManager, when one processes dies,
another takes its place so you don't need to pre-ordain which process
does which work. but let's assume that each process has exactly the
same amount of work and processes that work with the same speed. would
ForkManager be faster? Is there ever a case where multiple system()
calls is the answer?
> --
> -------------------- http://NewsReader.Com/ --------------------
> Usenet Newsgroup Service $9.95/Month 30GB
.
- Follow-Ups:
- Re: optimize log parsing
- From: xhoster
- Re: optimize log parsing
- References:
- optimize log parsing
- From: it_says_BALLS_on_your forehead
- Re: optimize log parsing
- From: it_says_BALLS_on_your forehead
- Re: optimize log parsing
- From: Tassilo v. Parseval
- Re: optimize log parsing
- From: it_says_BALLS_on_your forehead
- Re: optimize log parsing
- From: xhoster
- Re: optimize log parsing
- From: it_says_BALLS_on_your forehead
- Re: optimize log parsing
- From: xhoster
- optimize log parsing
- Prev by Date: Re: optimize log parsing
- Next by Date: In Search of Elegant Code - Change only the first null element in an array
- Previous by thread: Re: optimize log parsing
- Next by thread: Re: optimize log parsing
- Index(es):
Relevant Pages
|