Re: Intrinsic matmul vs. LAPACK

Suppose I have a general matrix and a vector and I want to multiply
them.  Does anyone know how the performance of the Fortran 90/95
intrinsics (matmul, dot_product, etc.) compare to the LAPACK routines
for general matrices (_gemv, _nrm2, etc.)?  What about if the matrix
is symmetric?  I would guess that then the special structure of the
matrix makes the LAPACK routines more efficient.

For small matrices the Matmul intrinsic is fine. For large matrices
(in a serial code) you would want a multithreaded BLAS3 like Intel's
MKL. I dont know if Matmul is multithreaded in any of the compilers.

LAPACK is built on top of BLAS so a faster/multithreaded BLAS would
mean a faster LAPACK.

Note: I generally dont work with large dense matrices.