MATMUL slower than expected
- From: "Matthew Halfant" <halfant@xxxxxxx>
- Date: Mon, 30 Jan 2006 04:45:53 GMT
On a few occasions I've converted Matlab code to Fortran 95 for
performance, and gotten speedup factors of 200 or more. I just went
through such an exercise and surprisingly took a hit of more than a
factor of 20. Profiling on both sides showed that all the time
(Matlab and Fortran) was being spent in a single statement that
evaluated a product of three large matrices. Algorithm execution
speeds were in the same ratio as the respective matrix multiplication
speeds.
Running on a 2.4 GHz P4 under Win2k, Matlab multiplies two 1000 x 1000
matrices in about half a second:
>> a = rand(1000,1000); b = rand(1000,1000);
>> tic; c = a*b; toc
Elapsed time is 0.513914 seconds.
>>
The f95 code below, which I ran on a few compilers, shows that MATMUL
takes significantly longer for the same task, the least unfavorable
showing being Lahey's, which is only slower by a factor of 3. g95
comes next, with a factor of 8. Intel's compiler, which I'd been
using, is slower by a factor of 26.
Interestingly, a couple of the compilers are faster running explicit
loops than using MATMUL. Here are the results for each compiler
tested, in the progam order of MATMUL, triple-loop, and double-loop
using DOT_PRODUCT:
Salford ftn95: 13.7969 16.1563 14.9219
Compaq VF: 13.84375 9.640625 9.562500
GNU g95: 4.125 12.84375 9.546875
Intel ifort: 13.71875 9.593750 9.609375
Lahey/Fujitsu: 1.59375 9.046875 9.671875
Why is Matlab so much faster at multiplying matrices? Obviously
they've optimized it aggressively, but shouldn't f95 compilers be
doing the same? I know the requirements are different: f95 must deal
with pointer arguments that view odd slices of matrices, while Matlab
lives in the f77 world where matrix elements are memory-contiguous in
column order. But the program below is the simple case -- shouldn't
the compiler detect that and dispatch accordingly?
I was wondering about trying a package like Intel's Math Kernel
Library, whose BLAS routines might speed up matrix multiplication as
Matlab has done. But a cursory inspection of the documentation
suggests that this will require making a lot of funny low-level
calls. What I really want is something that makes MATMUL run faster.
Any suggestions?
------------------------ file: matmul.f90 -----------------------
module data
real(kind(1.d0)), dimension(1000,1000) :: s1,s2,s3
end module data
program test
use data
integer, parameter :: n = 1000
real(kind(1.d0)), dimension(n,n) :: a,b
real :: t1,t2,e12,e13
integer :: i,j,k
call random_number(a)
call random_number(b)
call cpu_time(t1)
s1 = matmul(a,b)
call cpu_time(t2)
print *,'time using matmul: ',t2-t1
call cpu_time(t1)
do i=1,n
do j=1,n
s2(i,j) = 0
do k=1,n
s2(i,j) = s2(i,j) + a(i,k)*b(k,j)
enddo
enddo
enddo
call cpu_time(t2)
print *,'triple loop time: ', t2-t1
call cpu_time(t1)
do i=1,n
do j=1,n
s3(i,j) = dot_product(a(i,:),b(:,j))
enddo
enddo
call cpu_time(t2)
print *,'double loop with dot_product: ',t2-t1
e12 = maxval(abs(s2-s1))
e13 = maxval(abs(s3-s1))
if (e12 > 0) print *,'max error 1 and 2: ',e12
if (e13 > 0) print *,'max error 1 and 3: ',e13
end program test
---------------------- end of: matmul.f90 -----------------------
.
- Follow-Ups:
- Re: MATMUL slower than expected
- From: Tim Prince
- Re: MATMUL slower than expected
- From: Ron Shepard
- Re: MATMUL slower than expected
- From: James Van Buskirk
- Re: MATMUL slower than expected
- Prev by Date: Re: How to detect NULL input?
- Next by Date: Re: MATMUL slower than expected
- Previous by thread: How to detect NULL input?
- Next by thread: Re: MATMUL slower than expected
- Index(es):
Relevant Pages
|
|