Re: improve strlen
- From: "jukka@xxxxxxxxxxxx" <spamtrap@xxxxxxxxxx>
- Date: Mon, 24 Oct 2005 22:16:22 +0000 (UTC)
Correction to the previous post, here is the corrected code..
regression tests pay off, noticed "slight" bug in the SBB handling.. I
falsely wrote the C code as-if the sbb was using ecx, not cl.. this
makes a big difference, the bit we wanted to inspect was 7, not 31..
I also tested different versions of the "sum" routine (basicly, if in a
byte the bit 7 is set increase sum...), tried this for example:
s += !((v >> 31) & 1) + ...;
That's branchless.. but I didn't like the NOT's there so I did add:
v = ~v;
s += ...;
No dice: still not good, the code was slower than the one in the fixed
version below.. the divide-and-conquer technique that has two
compare-branches still beats the branch-free version (testing on
Pentium M).
This code is somewhat regress tested with random generated strings,
against strlen() for returning correct length.. and also doing bound
checking on array access. Note: there are some bytes
after-reserved-memory depending on left over chars in a dword.. 3, 2 or
1 bytes are read too much. This isn't a problem with Windows memory
allocator for instance which reserves memory in *heap* using new/malloc
in 8 byte chunks. This isn't problem on stack either with objects with
"auto" storage class, as stack is valid reading area.. ofcourse the
chars after terminating null character are garbage, but this code
doesn't care about that.
I didn't think how this works on big endian architecture nor the code
is portable in current shape anyway. For instance, aliasing the "const
char*" with "unsigned int" is very bad programming in general, the
conversion from pointer to integer should be done differently.. but I
do it that way for this version of the code only. :)
Actually, writing portable code is harder than it seems.. and that code
ain't. But demonstrates the concept of "C/C++ code isn't that slow" ;-)
int xstrlen(const char* text)
{
const char* p = text;
unsigned int a = reinterpret_cast<unsigned int&>(text);
unsigned int alignment = ((a + 3) & 0xfffffffc) - a;
unsigned int s = alignment;
if ( alignment )
{
for ( unsigned int i=0; i<alignment; ++i )
{
if ( *p++ == 0 )
return static_cast<int>(p - text - 1);
}
}
s -= reinterpret_cast<unsigned int&>(p) + 3;
const unsigned int* ap = reinterpret_cast<const unsigned int*>(p);
unsigned int v = 0;
for ( ; !v; )
{
unsigned int u = *ap++;
v = (u - 0x01010101) & ~u & 0x80808080;
}
if ( !(v & 0x8080) )
{
v >>= 16;
s += 2;
}
if ( !(v & 0x80) )
{
++s;
}
return reinterpret_cast<unsigned int&>(ap) + s - 1;
}
Some refactoring might still pay off.. looking into how "s" is updated
there might be room for improvement, but I think the basic idea is
nailed down more or less now. I think one or two variables could be
dropped out without aliasing problems.. but I think, the innerloop is
where the action is at, and that's pretty decent as things were.. so
optimizing further might not yield very good improvements, except with
very short strings.. well, which are, fast case anyway! So that about
wraps it up..
.
- Follow-Ups:
- Re: improve strlen
- From: jukka@xxxxxxxxxxxx
- Re: improve strlen
- References:
- improve strlen
- From: Claudio Daffra
- Re: improve strlen
- From: spamtrap
- Re: improve strlen
- From: hutch--
- Re: improve strlen
- From: spamtrap
- improve strlen
- Prev by Date: Re: About instruction lea
- Next by Date: Re: compiler generated output
- Previous by thread: Re: improve strlen
- Next by thread: Re: improve strlen
- Index(es):