Re: Poor performance with OpenMP

About the question: right now, I think you are partially measuring the
difference in making (plus threading/parallel processing).

     C = MATMUL(A,B)
     do j = 1,N
         C(j,:) = MATMUL(A,B(j,:))
     end do

Just to see something "interesting" you could time ex2 without -
fopenmp when compiling (it compiles, but does not use OpenMP) or run
with export OMP_NUM_THREADS=1.

Indeed. When I do as you suggest, the two programs have basically the
same runtime. The single-threaded is still faster, but only by about 1
second out of ~18.

Lesson: Parallel processing might be worse if it forces you to use a
poorer algorithm. In this instance, I guess that the for loop might
prevent the compiler from doing some optimizations, like making sure
you access memory sequentially maybe.

A side question: how are you measuring runtime?

I'm using the Unix "time" command and I'm reporting the total wall
clock time spent by the user.