Re: 64-bit shared counter
From: Terje Mathisen (terje.mathisen_at_hda.hydro.com)
Date: 03/26/04
- Next message: Matt Taylor: "Re: newbie about winAPI"
- Previous message: Matt Taylor: "Re: x86 architecture questions"
- In reply to: Matt Taylor: "Re: 64-bit shared counter"
- Next in thread: Matt Taylor: "Re: 64-bit shared counter"
- Reply: Matt Taylor: "Re: 64-bit shared counter"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Fri, 26 Mar 2004 16:40:13 +0000 (UTC)
Matt Taylor wrote:
> "Terje Mathisen" <terje.mathisen@hda.hydro.com> wrote in message
>>OK, in that case my code is definitely significantly faster, even if we
>>allow a pure 2X speedup from 2.8 vs 1.4 GHz (definitely not the case
>>here, with RAM speed being the limiter), it seems to run 7 times faster
>>than the various versions where the core code handles the carry
>
> propagation.
> <snip>
>
> I let it run for a couple days and then stopped. There was a huge disparity
> between run times after I inlined the functions, so it seems the function
> call overhead was more significant than I had at first thought. With that in
> mind, I dropped the run time to 2^36 locks and added your watchdog routine.
>
> I have placed the C++ and asm versions here:
> http://rabbithole.cc/ctr.cpp
> http://rabbithole.cc/ctr.asm
>
> My results show your watchdog routine consistently being >50% slower than
> the other versions. The 5th is the fastest. I don't understand why.
I do!
Your version of my code is wrong, as well as slow:
UINT64 __forceinline InterlockedIncrement64_1(volatile UINT64 *pn)
{
_asm
{
mov ecx, [pn]
mov eax, 1
lock xadd [ecx], eax
mov edx, fs:[g_pvHigh]
cmp eax, fs:[g_pvLow]
adc edx, 0
mov fs:[g_pvLow], eax
mov fs:[g_pvHigh], edx
}
}
Firstly, there is no need at all for the fs: segment overrides, the
shared 64-bit variable can reside anywhere in memory as long as all the
incrementing threads/processes can _read_ it!
Secondly, you are _writing back_ the updated results on every iteration,
which is both horribly slow (think ping-pong of the cache line), and
simply wrong:
This code depends critically on having a single watchdog thread that is
responsible for all actual updates, and which does them by copying the
updated 64-bit count to a new memory location, then doing an atomic
write to the 32-bit pointer to the new location.
Since the watchdog is read-only on every access until the wraparound
happens, it can work in parallel with the incrementers with low to zero
impact. On an update, it is the only writer to the 64-bit copies, so it
will never suffer locking delays caused by the incrementer threads.
With every incrementer writing back, you have a race condition between
loading te previous 64-bit value, doing the compare (propagating any
carry), and writing back the updated result.
>
> The latest results are:
> Routine 1: 12698 seconds (Terje's)
2^36 is 16 times 2^32, taking my 30-38 seconds per epoch as a starting
point, I'd get about 600 seconds or 10 minutes for the same job!
> Routine 2: 8428 seconds (base)
> Routine 3: 8121 seconds (Paul's)
> Routine 4: 8674 seconds (TLS)
> Routine 5: 8007 seconds (Paul's with early out spin)
>
> I have tried splitting the low & high parts into 2 different cache lines.
> The results are similar. Most routines are faster with them in the same
> cache line due to the locked updates on both halves.
That's one of the problems: YOu do need these numbers to be in different
cache lines, but that is so that you can have exclusive access to the
32-bit counter while doing read-only accesses to the full 64-bit copy.
Terje
>
> The only thing I can figure on the watchdog routine is that perhaps the fs
> references are slowing it down.
>
> -Matt
>
>
-- - <Terje.Mathisen@hda.hydro.com> "almost all programming can be viewed as an exercise in caching"
- Next message: Matt Taylor: "Re: newbie about winAPI"
- Previous message: Matt Taylor: "Re: x86 architecture questions"
- In reply to: Matt Taylor: "Re: 64-bit shared counter"
- Next in thread: Matt Taylor: "Re: 64-bit shared counter"
- Reply: Matt Taylor: "Re: 64-bit shared counter"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]