Suppose I have a general matrix and a vector and I want to multiply

them. Does anyone know how the performance of the Fortran 90/95

intrinsics (matmul, dot_product, etc.) compare to the LAPACK routines

for general matrices (_gemv, _nrm2, etc.)? What about if the matrix

is symmetric? I would guess that then the special structure of the

matrix makes the LAPACK routines more efficient.

For small matrices the Matmul intrinsic is fine. For large matrices

(in a serial code) you would want a multithreaded BLAS3 like Intel's

MKL. I dont know if Matmul is multithreaded in any of the compilers.

LAPACK is built on top of BLAS so a faster/multithreaded BLAS would

mean a faster LAPACK.

Note: I generally dont work with large dense matrices.

