Re: How to Make F77 Program Faster (g95 compiler) ??



In article <1195012159.842195.235580@xxxxxxxxxxxxxxxxxxxxxxxxxxx>,
monir <monirg@xxxxxxxxxxxx> writes:

Thank you all for your replies.


It would have been better to reply to each individual post
because you've completely lost context.

a) No, I never tried a "profiler". As a matter of fact, I don't even
know what it is! Is it a utility program or a debugging tool ?? Does
it work with F77 / g95 ??

A profiler tells you were the bottlenecks are in your code. I don't
know what OS you're using, but if it's Unix-like (eg linux), then
try adding -pg to the g95 command line. Note, you probably want
to recompile all routines with that option. Run the resulting
executable. This should file named something like gmon.out. You
then do

gprof -b -l <exec> gmon.out | more

where <exec> is the name of the executable. If g95 doesn't support
-pg, I know gfortran does. I'd be surprised if Intel et al did not
have a similar option.

The routine in question (Sub EFFECT5) could easily be identified as
the culprit based on the analytical & numerical models, as well as by
monitoring the progress during the execution of the program.

There are probably several others in c.l.f (including me) that
can tell you horror stories about optimizing the wrong routine
because we just knew it had to be a bottleneck. See Richard's
comment about picking a proper algorithm.

b) Yes, there're nested IF within nested DO loops.

This is going to kill performance.

d) No, I've never tried other Fortran compilers (Intel, Pathscale,
gfortran, etc.). Just over a year ago, I switched from MS Fortran v
5.1 to g95, and subsequently re-compiled all my FORTRAN programs. It
was a time consuming & difficult task, but it was absolutely
necessary!

Google "Polyhedron Benchmark". Of course, the only important
benchmark results are the ones involving your code. OTOH, Polyhedron
may give you guidance in picking additional compilers.

(Code removed)

res2=res21+res22+1.0D0/(rv*tanbin)**3.0D0*(C1T3)

You may want to get rid of the double precision exponent. Some
compilers will translate the above into exp(3.d0*log(yada)), which
can be expensive. This probably won't be a magic bullet, but try

res2=res21+res22+1.0D0/(rv*tanbin)**3*(C1T3)

This will be translated into 3 multiplications.

res3=res31+res32+1.0D0/(rv*tanbin)**3.0D0*(S1T3)

ditto

res4=res41+res42
1 + 1.0D0/(rv*tanbin)**3.0D0*(C1T2)
2 - 3.0D0*(xp-xv)/(rv*tanbin)**4*(C1T3) -
3 3.0D0*(-4.0D0*(xp-xv)**2+rv**2+rp**2)/2.0D0/
(rv*tanbin)**5*(C1T4)

ditto

res5=res51+res52
1 + 1.0D0/(rv*tanbin)**3.0D0*(S1T2)
2 - 3.0D0*(xp-xv)/(rv*tanbin)**4*(S1T3) -
3 3.0D0*(-4.0D0*(xp-xv)**2+rv**2+rp**2)/2.0D0/
(rv*tanbin)**5*(S1T4)

ditto

--
Steve
http://troutmask.apl.washington.edu/~kargl/
.



Relevant Pages

  • Re: Protected mode
    ... > stack fault in kernel level, when trying on real computer it just ... Perhaps you could use Bochs to debug it instead of VMware. ... alwayscont - continue execution, and don't ask again. ...
    (comp.lang.asm.x86)
  • Re: Save/restore data from a module?
    ... pgm. ... I will save all the relevant data at some ... points during the execution. ... g95 can do this for all variables under Linux in response to keyboard ...
    (comp.lang.fortran)