Re: Bit counting problem



The magic behind Nils solution and a 64bit MMX version are explicited in the Athlon Optimization guide (Integer Optimization/Population Count, free download from AMD's website).

I copy-pasted the raw MMX version below.

Eric


__declspec (naked) unsigned int __stdcall popcount64_1
(unsigned __int64 v)
{
static const __int64 C55 = 0x5555555555555555;
static const __int64 C33 = 0x3333333333333333;
static const __int64 C0F = 0x0F0F0F0F0F0F0F0F;
__asm {
MOVD MM0, [ESP+4] ;v_low
PUNPCKLDQ MM0, [ESP+8] ;v
MOVQ MM1, MM0 ;v
PSRLD MM0, 1 ;v >> 1
PAND MM0, [C55] ;(v >> 1) & 0x55555555
PSUBD MM1, MM0 ;w = v - ((v >> 1) & 0x55555555)
MOVQ MM0, MM1 ;w
PSRLD MM1, 2 ;w >> 2
PAND MM0, [C33] ;w & 0x33333333
PAND MM1, [C33] ;(w >> 2) & 0x33333333
PADDD MM0, MM1 ;x = (w & 0x33333333) + ((w >> 2) & 0x33333333)
MOVQ MM1, MM0 ;x
PSRLD MM0, 4 ;x >> 4
PADDD MM0, MM1 ;x + (x >> 4)
PAND MM0, [C0F] ;y = (x + (x >> 4) & 0x0F0F0F0F)
PXOR MM1, MM1 ;0
PSADBW (MM0, MM1) ;sum across all 8 bytes
MOVD EAX, MM0 ;result in EAX per calling convention
EMMS ;clear MMX state
RET 8 ;pop 8-byte argument off stack and return
}
}
.



Relevant Pages