96-bit lockless counter

From: Terje Mathisen (terje.mathisen_at_hda.hydro.com)
Date: 03/20/04

  • Next message: hutch--: "Re: asm"
    Date: Sat, 20 Mar 2004 09:32:05 +0000 (UTC)
    
    

    I mentioned a day or two ago that I had figured out a way to support
    longer counters, with better performance than the current 63-bit code,
    well here it is:

    The crucial idea is that _only_ the bottom 32 bits of the counter is
    shared, all other processes maintain their own copy of the higher parts!

    This is easy to do if you can guarantee that all threads will sample
    (not neccessarily upate!) the counter at least once per full 2^32 round:

    Since the basic update (LOCK XADD [counter],EAX) must use a full
    read-modify-update bus cycle, it cannot take less time than the basic
    bus speed, right?

    Assuming this can increase a _lot_, getting down to current cache
    speeds, we're still talking about at least 10 ns per update for the next
    decade or two.

    This means that the counter cannot roll over more often than once every
    42 seconds, more probably it will be in the several minutes range.

    If you cannot guarantee that all threads will wake up often enough to do
    this, then it can still be solved if you setup an extra helper thread
    that wakes up each second, and goes around to all of the separate
    counters, updating the high counts if needed.

    Here's the sampling code:

    sampleAndIncrement: ; EBX -> local copy of the full-sized counter
            mov eax,1
            lock xadd [counter],eax
            cmp eax,[ebx]
            mov ecx,[ebx+4]
            mov edx,[ebx+8]
    ; If we have wrapped around, the high parts must be updated.
    ; This can be done branchlessly using the carry from the CMP:
            adc ecx,0
            adc edx,0
            mov [ebx],eax
            mov [ebx+4],ecx
            mov [ebx+8],edx
            ret

    However, it is almost certainly faster to skip the updates that aren't
    needed:

            mov [ebx],eax
             jnc done
            adc ecx,0
            adc edx,0
            mov [ebx+4],ecx
            mov [ebx+8],edx
    done:
            ret ; EBX points at counter,
                            ;which is also in EDX:ECX:EAX
    Terje

    -- 
    - <Terje.Mathisen@hda.hydro.com>
    "almost all programming can be viewed as an exercise in caching"
    

  • Next message: hutch--: "Re: asm"

    Relevant Pages