Re: MATMUL slower than expected



In article <43DDC1C4.1000708@xxxxxxxxxxxxxxxxxx>,
Tim Prince <tprince@xxxxxxxxxxxxxxxxxx> writes:
>
> gfortran 4.2 on 1.5Ghz Pentium-m (SuSE 9.2):
> time using matmul: 11.13631
> triple loop time: 9.561546
> double loop with dot_product: 11.52825
> ifort 9.1 -xB -O3:
> time using matmul: 2.434629
> triple loop time: 2.432630
> double loop with dot_product: 14.08286
>
> Your choice of loop nesting seems particularly unfavorable when you
> instruct compilers not to try variations. gfortran has no option to do
> so. ifort optimized code does take advantage of stride 1 storage by column.
> On Pentium-m, there isn't much advantage in MKL over optimized Fortran
> source, but Xeon platforms would show more gain for MKL. As you must be
> aware, the "funny" f77 oriented BLAS calling sequences are well
> entrenched in practice, but you could employ the f90 wrappers.
> gfortran still has the documented peculiarity that sum(a*b) is
> significantly faster than dot_product(a,b).

There is a patch to improve dot_product (and matmul) in gfortran.
The author of the patch isn't quite happy with the current choice
of a switch-over point from one algorithm to another.

--
Steve
http://troutmask.apl.washington.edu/~kargl/
.