Re: Optimizing a basic dot product loop
From: Phil Carmody (thefatphil_demunged_at_yahoo.co.uk)
Date: 05/07/04
- Next message: Robert Wessel: "Re: I/O Address Space"
- Previous message: Eugene: "Re: Optimizing a basic dot product loop"
- In reply to: Christophe Grimault: "Optimizing a basic dot product loop"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 6 May 2004 22:33:14 +0000 (UTC)
Christophe Grimault <christophe.grimault@sacet.com> writes:
> Hi all,
>
> I have this loop, in C (gcc/g++), and I'm reaching the limits of my
> "optimization skills". All is done with pointers, and I have unrolled
> the computation loop by 4.
>
> register int n,k,m;
> register float acc_r, acc_i;
> register float *ptr_h, *ptr_d;
>
> for ( n = 0, m = 0 ; n < Nx; n += K, m++){
>
> ptr_h = (float*)h.data();
> ptr_d = (float*)(d.data()+Nh-1+n);
>
> for ( k = 0, acc_r = 0.0f, acc_i = 0.0f ; k < 4*(Nh>>2); k+=4) {
> acc_r += *ptr_h * *ptr_d--;
> acc_i += *ptr_h++ * *ptr_d--;
>
> acc_r += *ptr_h * *ptr_d--;
> acc_i += *ptr_h++ * *ptr_d--;
>
> acc_r += *ptr_h * *ptr_d--;
> acc_i += *ptr_h++ * *ptr_d--;
>
> acc_r += *ptr_h * *ptr_d--;
> acc_i += *ptr_h++ * *ptr_d--;
> }
>
> Is it possible to do better in C ?
Is there any reason why you decrement ptr_d 8 times in the loop?
Just use offsets, and change it once. A decent compiler might
perform this optimisation automatically, but why not give it some help.
> I am planning to use inline asm to gain speed. Is it possible to gain
> some speed ? Is it "necessary / possible / useful" to make use of
> SSE/MMX/3DNow/SSE2 asm instruction. Has somebody done something like
> this and could give me pointers to a source code or basic technique to
> rely on ?
Grab the various AMD documents - they have example code for situations
just like this. And it's not just the computation part that matters,
the simple memory access patterns can be important too. (e.g. you might
want to prefetch)
Phil
-- 1st bug in MS win2k source code found after 20 minutes: scanline.cpp 2nd and 3rd bug found after 10 more minutes: gethost.c Both non-exploitable. (The 2nd/3rd ones might be, depending on the CRTL)
- Next message: Robert Wessel: "Re: I/O Address Space"
- Previous message: Eugene: "Re: Optimizing a basic dot product loop"
- In reply to: Christophe Grimault: "Optimizing a basic dot product loop"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|
|