Performance degradation from PGF90-Xeon to Compaq F90-Alpha

From: denis (dpithon_at_freenospam.fr)
Date: 07/30/04


Date: Fri, 30 Jul 2004 16:43:19 +0200

Hello all,

My Question:
Do I forget some optimization options for Compaq F90 ?

Background:
We use a wave model on a dual-Xeon 2.4Ghz with 2Gb RAM. It was compiled
with pgf90, using -omp, -fastsse and -piv options.

Then, we have installed the model on a Compaq cluster (till 4 nodes
usable at once). Each node are quad-Alpha (ev68 cpus 1Ghz) with 4Gb of
RAM. We compile it with Compaq F90 with -fast option (which seems to be
an alias for "-align dcommons -arch host, -assume noaccuracy_sensitive
-math_library fast -O4 -tune host"), and we test it with OpenMP and with MPI

The "problem":
the "problem" is that the dual-Xeon is the fastest. Here are the elapsed
time of the same run launches with different parallel protocols (OpenMP
and MPI) on the Xeon and on the Alpha :

- With OpenMP (4 threads)
dual-Xeon : 0h 47m (without -fastsse: 01h 46m)
quad-Alpha : 2h 49m

- With MPI (Processes are dispatched on multiple nodes of the cluster, 1
process per cpu)
  8 Alpha cpu : 1h 25m
12 Alpha cpu : 1h 02m
16 Alpha cpu : 0h 46m

Factually, the code consume many floating point operations and is highly
designed for vectorization. It seems to benefit from SIMD Xeon feature.
But i'm quite disappointed with this result. Do I forget some
optimization options for F90 ?

thanks for your lightnings.

Denis