Re: I need the fastest routine



"Rudy Velthuis [TeamB]" <newsgroups@xxxxxxxxxxxx> wrote in message
news:xn0fsceh7g1c0lk01gnewsgroups@xxxxxxxxxxxxxxx
Dan Downs wrote:

I've found loop unrolling to be faster in the past (until the K8 and
P4), but current processors minimize the jump in the loop so much
that it doesn't really do the trick anymore.

I tried the maxtest2.dpr, ans found some interesting things (for me).
(Using D7 on an AMD Athlon 850 MHz.

Trkulja's loop unrolling code, if modified to be using local max and
min values (like Seidel does), is the fastest, but if it doesn't use
local min and max, it is a little slower than normal loops. Clèment
posted the code in his latest post.

I tried changing my own code to use local in and max. There were virtually
no difference (2 on the fourth decimal).

So loop unrolling can be faster, but apparently only under certain
conditions.

Then i tried 'unrolling' with the following construct (A little Q&D, but
just to see the diff):
TYPE
tIa = PACKED ARRAY [0..3] OF INTEGER ;
tpIa = ^tIa ;
var
I: Integer;
P: PInteger;
pIa : tPIa ;
iMin,iMax: INTEGER ;
begin
iMax := MinInt;
iMin := MaxInt;
pIa :=@aArray[0];
for I := 0 to High(aArray) DIV 4 do
begin
if ( PIa^[0] > iMax ) OR ( PIa^[1] > iMax ) OR ( PIa^[2] > iMax ) OR (
PIa^[3] > iMax ) then BEGIN
IF PIa^[0] > iMax THEN
iMax := PIa^[0];
IF PIa^[1] > iMax THEN
iMax := PIa^[1];
IF PIa^[2] > iMax THEN
iMax := PIa^[2];
IF PIa^[3] > iMax THEN
iMax := PIa^[3];
END ;
if ( PIa^[0] < iMin ) OR ( PIa^[1] < iMin ) OR ( PIa^[2] < iMin ) OR (
PIa^[3] < iMin ) then BEGIN
IF PIa^[0] < iMin THEN
iMin := PIa^[0];
IF PIa^[1] < iMin THEN
iMin := PIa^[1];
IF PIa^[2] < iMin THEN
iMin := PIa^[2];
IF PIa^[3] < iMin THEN
iMin := PIa^[3];
END ;
Inc(PIa);
end;
aMax := iMin;
aMin := iMax;
end;

This gave some performance improvements (loop = 10 times):
0,288898 s
0,243985 s

Then i tried 'unrolling' with an array of 8, at then it was:
0,288898 s-mine
0,243985 s-mine 4
0,236794 s-mine-8
0,289013 s - unmodified
0,289026 s - unmodified
0,246168 s - unmodified

So it seems that to a certainn point, there could be some gain in unrolling.
But this is old HW, so it could be changed.

--
Med venlig hilsen/Best regards
Stig Johansen



.



Relevant Pages

  • Re: I need the fastest routine
    ... I have changed it to 1 offset instead of 0 offset. ... Using pointers ... IF PIa^> iMax THEN ... IF PIa^< iMin THEN ...
    (borland.public.delphi.language.basm)
  • Re: I need the fastest routine
    ... I really can't thank you enough for helping me with this one. ... I just can't imagine the performance boost if I replace the minmax routine with the one you provided. ... I am working with local variables iMin and iMax instead of aMin and aMax. ...
    (borland.public.delphi.language.basm)
  • Re: Generalized range
    ... I think just increasing a "counter" from iMin to iMax should be fine, achieving more precision by making computations fuzzy -- i don't know, perhaps it's very good. ... elsewhere) is the addition/subtraction of a fixed epsilon to ensure ...
    (comp.lang.python)
  • Generalized range
    ... return range(iMin, iMax, iStep) ... elsewhere) is the addition/subtraction of a fixed epsilon to ensure ... particular values of iMax, iMin and iStep. ...
    (comp.lang.python)
  • Re: Problem compiling old code with gfortran
    ... In the first iteration, i starts as imin. ... start at imin + iadd or inew + iadd? ... trick to make the compiler unaware of what I was doing ... ... The loop would simply continue, ...
    (comp.lang.fortran)