Re: duff's device / loop unriolling




In article <0001HW.BF2BCBB9008348F7F0386550@xxxxxxxxxxxxxxxx>, Randy Howard <randyhoward@xxxxxxxxxxxxxxxxx> writes:
>
> It's suddenly being discussed again, probably as a result of an
> article in DDJ (Doctor Dobb's Journal) about it in the August
> issue. Ralf Holly proposed using it in macro form for generic
> loop unrolling, which probably makes people misunderstand its
> original purpose.
>
> He proposed something like

[Tabs converted to spaces to avoid wrapping.]

> #define DUFF_DEVICE_8(macroCount, macroAction) \
> do { \
> size_t duffcount = (macroCount); \
> size_t dufftimes = (duffcount + 7) >>3u; \
> switch(duffcount & 7) { \
> case 0: do { macroAction; \
> case 7: macroAction; \
> case 6: macroAction; \
> case 5: macroAction; \
> case 4: macroAction; \
> case 3: macroAction; \
> case 2: macroAction; \
> case 1: macroAction; \
> } while (--dufftimes > 0); \
> } \
> } while (0)
>
> Of course, the caller has to know not to call with a 0
> countvalue or it will execute 8 times instead.

Unless I'm missing something, that's easily fixed with a

if (!duffcount) break;

before the switch. (Testing duffcount avoids using macroCount,
which might have side effects, twice, of course.)

A worse problem would be using it with a negative count; hopefully
the compiler would provide a useful diagnostic for the conversion
between a signed value and a size_t when duffcount is initialized,
but that's a QoI issue. (Also, I know far too many C programmers
who routinely ignore such diagnostics, partly because their code is
full of them. I suppose that's a QoP issue.)

> Probably not all that practical in general use, but you might
> find somewhere, (after profiling) where it makes some sense.

Agreed. I have found cases where relatively recent commercial
implementations don't unroll even simple loops where unrolling has a
significant benefit. Of course, if the loop isn't in a performance-
critical path, it doesn't matter anyway.

--
Michael Wojcik michael.wojcik@xxxxxxxxxxxxxx

We are subdued to what we work in. (E M Forster)
.



Relevant Pages

  • Re: Spill code and unrolling
    ... I've a question about loop unrolling. ... which are candidates for loop unrolling contain any spill code. ... a loop may help the instruction scheduler (at compilation or execution ...
    (comp.compilers)
  • Re: Faster HexToBuffer Routines
    ... Here is a WORKING unrolled loop version of my routine. ... Unrolling the loop, of course, speeds things up ... most of the string instructions ...
    (alt.lang.asm)
  • Re: Implement memcmp function with help from gccs inline asm
    ... Unrolling a loop is reducing the number ... >>> of iterations by duplicating the loop body. ... > test ecx, ecx ... > My version of this code has 2 fewer instructions in the loop body. ...
    (comp.lang.asm.x86)
  • Cache size restrictions obsolete for unrolling?
    ... into the cache (i.e. the code of the loops was slightly smaller than ... execution time using a cycle-true simulator. ... unrolling factor stepwise resulting in the unrolled loop that exceeded ... I expected to get a performance decrease, i.e. the stronger the loop ...
    (comp.compilers)