Re: improve strlen



If you going to make a big deal out of different backends, atleast then
take some effort to use the instructions which are available, something
along these lines...

pxor mm0,mm0
xloop:
pcmpeqb mm0,[esi]
pmovmskb eax,mm0
pxor mm0,mm0
add esi,8
// ...

Feel free to change the order if you think it helps

- do a 64 bit / 8 component compare, the dest will have all byte cells
initialized with 0xff if the corresponding byte value in source operand
was zero, 0x00 otherwise. Then pack the MSB of each component into 32
bit register.

>>From there on the rest is too trivial to even mention.. about
instruction counts, your technique uses 1.5 instructions (roughly) per
char, this does 0.5 instructions (roughly) per char, and uses 64 bit
aligned reads. How much faster it is in practise.. if you want to
know.. find out!

p.s. don't worry I been around..

.