Re: improve strlen



Ahoy, here's the 'final' version that is put into experimental library,
abstracted the functionality with template specialization. This is used
indirectly by a string class which is implemented using c++ template
meta programming. I just snipped the relevant portions, lot of typedefs
etc. are not displayed..

The code been refactored for better performance, and it works on both
little and big endian architechtures (tested on MIPS R10000 and PPC
G5). Goes without saying that works on IA32 and AMD64 / x86-64..

template <typename chartype>
inline int string_length(const chartype* text)
{
assert( text != NULL );

const chartype* s = text;
for ( ; *s; ++s )
;

return static_cast<int>(s - text);
}

template <>
inline int string_length<char>(const char* text)
{
assert( text != NULL );

const char* p = text;

const char* base = 0;
meta::intp address = static_cast<meta::intp>(text - base);
unsigned int alignment = ((address + 3) & 0xfffffffc) - address;

if ( alignment )
{
for ( unsigned int i=0; i<alignment; ++i )
{
if ( *p++ == 0 )
return static_cast<int>(p - text) - 1;
}
}

const uint32* ap = reinterpret_cast<const uint32*>(p);
uint32 v = 0;

for ( ; !v; )
{
uint32 u = *ap++;
v = (u - 0x01010101) & ~u & 0x80808080;
}

uint32 s = static_cast<int>(reinterpret_cast<const char*>(ap) - p) +
alignment - 3;

#ifdef FUSIONCORE_BIG_ENDIAN

if ( !(v & 0x80800000) )
return v & 0x8000 ? s + 1 : s + 2;

return v & 0x80000000 ? s - 1 : s;

#endif

#ifdef FUSIONCORE_LITTLE_ENDIAN

if ( !(v & 0x8080) )
return v & 0x00800000 ? s + 1 : s + 2;

return v & 0x0080 ? s - 1 : s;

#endif
}

Note, the different offsets are: -1,0,1,2 .. it might be possible to
compute those more efficiently, currently using if-else-?: mess.. if we
assign each bit different weight and do a masked sum, we might get the
index adjustment value FAST, but doesn't look like worth the effort.. I
try not to post once more on the topic. :)

Hope this is my last post about this..

FYI, test results:

strlen() 24.0 usec
this version: 14.5 usec
asm version: 13.5 usec

That's on one same machine (Pentium M 1.8 GHz), some number of
iterations and test repetitions.. average of 20 tests (smallest and
largest result dropped out from average) with timings rounded to
nearest 0.5 usec.. the difference between asm and c++ versions seem
neglible from practical point of view ( < 8 % ).

Thanks for the tip, this comes in handy (not that string initialization
never been a performance bottleneck for me.. but in principle let's
have better code when possible, plus I enjoy little optimization fun
now and then, so thank you!!!)

.