Re: Optimizing a basic dot product loop

From: Phil Carmody (thefatphil_demunged_at_yahoo.co.uk)
Date: 05/07/04


Date: Thu, 6 May 2004 22:33:14 +0000 (UTC)

Christophe Grimault <christophe.grimault@sacet.com> writes:

> Hi all,
>
> I have this loop, in C (gcc/g++), and I'm reaching the limits of my
> "optimization skills". All is done with pointers, and I have unrolled
> the computation loop by 4.
>
> register int n,k,m;
> register float acc_r, acc_i;
> register float *ptr_h, *ptr_d;
>
> for ( n = 0, m = 0 ; n < Nx; n += K, m++){
>
> ptr_h = (float*)h.data();
> ptr_d = (float*)(d.data()+Nh-1+n);
>
> for ( k = 0, acc_r = 0.0f, acc_i = 0.0f ; k < 4*(Nh>>2); k+=4) {
> acc_r += *ptr_h * *ptr_d--;
> acc_i += *ptr_h++ * *ptr_d--;
>
> acc_r += *ptr_h * *ptr_d--;
> acc_i += *ptr_h++ * *ptr_d--;
>
> acc_r += *ptr_h * *ptr_d--;
> acc_i += *ptr_h++ * *ptr_d--;
>
> acc_r += *ptr_h * *ptr_d--;
> acc_i += *ptr_h++ * *ptr_d--;
> }
>
> Is it possible to do better in C ?

Is there any reason why you decrement ptr_d 8 times in the loop?
Just use offsets, and change it once. A decent compiler might
perform this optimisation automatically, but why not give it some help.

> I am planning to use inline asm to gain speed. Is it possible to gain
> some speed ? Is it "necessary / possible / useful" to make use of
> SSE/MMX/3DNow/SSE2 asm instruction. Has somebody done something like
> this and could give me pointers to a source code or basic technique to
> rely on ?

Grab the various AMD documents - they have example code for situations
just like this. And it's not just the computation part that matters,
the simple memory access patterns can be important too. (e.g. you might
want to prefetch)

Phil

-- 
1st bug in MS win2k source code found after 20 minutes: scanline.cpp
2nd and 3rd bug found after 10 more minutes: gethost.c
Both non-exploitable. (The 2nd/3rd ones might be, depending on the CRTL)


Relevant Pages

  • OSDK Version 0.013
    ... Corrected a bug in the makefile that make it ... Added a new program in the OSDK, ... I tested it to compile the game from Fabrice, I had to do a few ... modifications in the source code, but also some bugs to correct in the OSDK ...
    (comp.sys.oric)
  • Re: new 20050811 kernels available
    ... VCAFILT is known to be buggy - you might want to ask John Rodriguez if he will open source it. ... Hell, the problem could even be in calls between Sunley's YadTV2 and gradd video system, which passes through the kernel. ... He only borrowed the source code of wcast.sys to fix a bug as a favor to me. ... Since I believe he no longer has an OS/2 build system, he is not able to fix this bug from wcast.sys side, assuming that he could borrow the code again. ...
    (comp.os.os2.bugs)
  • Re: LambdaRogue v0.3 (gamma 1)
    ... blog and binary downloads are provided here: ... Source code and older releases can be found on LambdaRogue's SourceForge ... unlikely that it's bug free. ... couldn't figure out the diagonal scheme.) ...
    (rec.games.roguelike.development)
  • Re: Submission of p-code
    ... What that has to do with controlling your missile isn't ... part of you missile warhead control loop, ... was bug and sneak circuit-free. ... Granted that without the source code, ...
    (comp.soft-sys.matlab)
  • Re: [Diehard] Overlap sum test
    ... >> I'd like to hear some comment based on real facts. ... I pasted your explanation in the source code of my implementation of ... possibly bug and nobody seems interested in that post. ...
    (sci.crypt)