Re: Fastcode CharPosRev B&V 0.6.2



Hi Dennis,

There is still a weakness in the validation. The modified Validate13 listed
below tests more thoroughly that a match is not found before the first
character of the string.

Some of your functions, including your fast SSE2 function, fail this
validation.

function TMainForm.Validate13 : Boolean;
var
S: String;
Ch: Char;
I: Integer;
const
VALIDATENO : Cardinal = 13;
begin
SetLength(S, $01010101); {Sets All Bytes of Length Integer to 1}
Fillchar(S[1], length(S), 0);
Ch := #1;
Result := CharPosRevFunction(Ch, S) = 0;
if not Result then
ErrorTrap(VALIDATENO, Ch, S);
if not Result then
Exit;
SetLength(S, $010101); {Sets Lower 3 Bytes of Length Integer to 1}
Fillchar(S[1], length(S), 0);
Ch := #1;
Result := CharPosRevFunction(Ch, S) = 0;
if not Result then
ErrorTrap(VALIDATENO, Ch, S);
if not Result then
Exit;
SetLength(S, $0101); {Sets Lower 2 Bytes of Length Integer to 1}
Fillchar(S[1], length(S), 0);
Ch := #1;
Result := CharPosRevFunction(Ch, S) = 0;
if not Result then
ErrorTrap(VALIDATENO, Ch, S);
if not Result then
Exit;

for I := 1 to 100 do
begin
SetLength(S, I);
Fillchar(S[1], length(S), 0);
Ch := #1;
Result := CharPosRevFunction(Ch, S) = 0;
if not Result then
begin
ErrorTrap(VALIDATENO, Ch, S);
Exit;
end;
end;

end;



Also, I was interested to note how much faster the DKC_SSE2_1 function was
on AMD CPU's compared to Intel. I had discounted similar functions when
testing on my P4.

The function listed below is my "guess" at a function that should perform
well on AMD's


function CharPosRev_JOH_IA32_4_a(SearchChar : Char; const S: string) :
Integer;
asm {136 Bytes}
test edx, edx {S = nil?}
jz @@NotFound {Yes, Exit with Result = 0}
mov ecx, eax {SearchChar}
mov eax, [edx-4] {Length(S)}
cmp eax, 32
jg @@Large {Length(S)>32}
@@Loop:
cmp cl, [eax+edx-1] {Check Next Character from End}
je @@1 {Exit on Match}
cmp cl, [eax+edx-2] {Check Character Before}
je @@2 {Exit on Match}
cmp cl, [eax+edx-3] {Check Character Before}
je @@3 {Exit on Match}
cmp cl, [eax+edx-4] {Check Character Before}
je @@4 {Exit on Match}
sub eax, 4 {All Characters Checked?}
jg @@Loop {No, Loop}
@@NotFound:
xor eax, eax {Result := 0}
ret
@@4:
dec eax
@@3:
dec eax
@@2:
dec eax
js @@NotFound {Match Found before First Char}
@@1:
ret
@@Large:
mov ch, cl
sub eax, 16 {Length(S)-16}
movd xmm0, ecx
pshuflw xmm0, xmm0, 0
pshufd xmm0, xmm0, 0 {All 16 Bytes of xmm0 = cl}
movdqu xmm1, [eax+edx] {Check Last 16 Chars}
pcmpeqb xmm1, xmm0
pmovmskb ecx, xmm1
test ecx, ecx {Match Found?}
jnz @@Match {Yes, Calc Result}
add eax, edx {DQWORD Align [eax+edx]}
and eax, -16
sub eax, edx
@@LargeLoop:
movdqa xmm1, [eax+edx] {Check 16 Chars per Loop}
pcmpeqb xmm1, xmm0
pmovmskb ecx, xmm1
test ecx, ecx {Match Found?}
jnz @@Match {Yes, Calc Result}
sub eax, 16
jge @@LargeLoop
movd ecx, xmm0 {Put SearchChar back into CL}
add eax, 16
jnz @@Loop {Search any Remaining Chars}
ret {No Remainder, Return 0}
@@Match:
bsr ecx, ecx {Set Match Offset}
lea eax, [eax+ecx+1]{Set Result}
end;

--
regards,
John

The Fastcode Project:
http://www.fastcodeproject.org/


.



Relevant Pages

  • misc, OT: C compiler, the joys of SSE...
    ... push ebp; mov ebp, esp ... lea eax, ... movss xmm0, ...
    (alt.lang.asm)
  • c++ inline assembler and oop
    ... I have a vector-class and try to implement the vector operations with SSE. ... ;load v1 in xmm0 mov ebx, this lea eax,.v1 movaps xmm0, eax;load v2 in xmm1 lea eax,.v2 movaps xmm0, eax ... ...
    (comp.lang.asm.x86)
  • Re: testing SSE(2) logical operation results
    ... > pxor xmm0, xmm0 ... > pmovmskb eax, xmm0 ... > jz AllZero ...
    (comp.lang.asm.x86)
  • Re: Fastcode CharPosRev B&V 0.2.0
    ... bytes before the first character of the string (the string length ... if Result < 0 then {Match Found before First Char} ... mov eax, ... Additional validation is needed to ensure that a function does not assume ...
    (borland.public.delphi.language.basm)