Re: COMPARE HLL/ASM




santosh wrote:
....
If you don't mind, I specify in more detail:
* string is zero-padded fix-sized, no leading spaces ,no end-mark and
UPPERCASE ie: "FEDCBA98"
* no error indication if any non hex-characters are found.
* just for compatibilty with oldies: no SSE-instructions allowed.

Okay. But for my C code the above conditions are already too late
because I finished coding it before I saw this post. I accept variable
length strings upto ffffffffffffffff with optional leading zero
padding. Both upper and lower case are accepted.

Invalid or malformed strings are rejected, but not by the actual
conversion routine, but when program starts up. This takes place before
the conversion loop is entered, so the time spent is irrelevant.

OK. I'll double my code to make it 128->64 bit and add the U/L-case yet.
And we just measure conversion time.

I used no optimisations for compiling.

* there are no C-functions available on most ASM tools,
so RDTSC seems to be the only choice.

Okay. I used RDTSC as well.

* running it several million times will just measure back-ground
noise from the OS, and STI/CLI may not work on every OS.
I'd recommend to run it three to four times and report the smallest
time interval.

Okay.

So here is my effort. The invocation and output are shown for 3 runs of
1 million loops each.

$ ./hstoq_icc aabbccddeeff0011
input: aabbccddeeff0011 passes: 1000000
rdtsc:
start = 40949977738136
end = 40950155398528
difference = 177660392

I see: ~177 per pass

$ ./hstoq_icc aabbccddeeff0011
input: aabbccddeeff0011 passes: 1000000
rdtsc:
start = 40955083892880
end = 40955285678376
difference = 201785496

now: ~201 may mean some more IRQs occured yet

$ ./hstoq_icc aabbccddeeff0011
input: aabbccddeeff0011 passes: 1000000
rdtsc:
start = 40956923434320
end = 40957101562616
difference = 178128296

Ok, ~178 is almost close to the first

I don't how many seconds those measurements come up to. Here is the same
command timed by the UNIX 'time' utility.

$ time ./hstoq_icc aabbccddeeff0011
input: aabbccddeeff0011 passes: 1000000
rdtsc:
start = 41221246973984
end = 41221414368856
difference = 167394872

real 0m0.107s
user 0m0.108s
sys 0m0.000s

I have shown only one run to preserve space.

As the output shows, the code takes about one-tenth of a second for a
million passes over the test string. Obviously, larger test strings
(and greater loops) take more time than smaller ones.

Real time is hard to compare on different machines/OS ...

[the code...]
It would be interesting to see it disassembled.
This FOR/WHILE/IF may end up with many branch instructions ?
If I compare only the source size yet,
C-coders seem to need much more keystrokes than ASM'ers ;)

Ok, I'll check my code speed after the addons again.
Done yet.
__
wolfgang

___________
The complete code follows. It compiles with the GNU C compiler or the
Intel C compiler. The compilation command is:

For gcc:
gcc -Wall -W -std=gnu99 hstoq.c

For Intel C:
icc -Wall -W -std=c99 hstoq.c

Usage is: PROGRAM HEXSTRING [ITERATIONS]

PROGRAM is executable file name.
HEXSTRING is the string to be converted.
ITERATIONS is the number of times to loop. This is optional. If left out
the default is 1,000,000 loops.

Code:
=====

#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <errno.h>
/* #define DBG 1 */ /* uncomment for debug output */
#define DEFAULT_LOOPS 1000000L

static inline uint_fast64_t hstoq(const char *restrict, int);
static inline uint64_t fx_rdtsc(void);

int main(int argc, char **argv) {
char *hexstr;
int nch;
uint_fast64_t compare, val;
uint64_t start, end;
register long loops = DEFAULT_LOOPS;
const char const *usage = "Usage: a.out STR [n]\nSTR - hex string;"
"must _not_ have prefix or suffix\nn - no. of times to convert."
" default is %ld\n\n";

if (argc < 2) goto err_exit;
else {
hexstr = argv[1];
nch = strlen(argv[1])-1;
#ifdef DBG
printf("argv[1] = %p\thexstr = %p\tnch = %d\nstrlen(argv[1])
= %d\n",
(void*)argv[1], (void*)hexstr, nch, strlen(argv[1]));
#endif
for (int ctr = nch; ctr >= 0; ctr--)
if (!isxdigit((int)argv[1][ctr])) goto err_exit;
if (argc > 2) {
errno = 0;
loops = strtol(argv[2], NULL, 0);
if (errno == ERANGE || loops <= 0) goto err_exit;
}
}
errno = 0;
compare = (uint_fast64_t)strtoull(argv[1], NULL, 16);
if (errno == ERANGE) goto err_exit;
printf("input: %s\tpasses: %ld\n", argv[1], loops);
start = fx_rdtsc();
while (loops >= 0) { val = hstoq(hexstr, nch); loops--; }
end = fx_rdtsc();
if (val != compare) {
printf("MISMATCH!\n\tstrtoull = %" PRIxFAST64
"\n\thstoq = %" PRIxFAST64 "\n", compare, val);
}
printf("rdtsc:\n\tstart = %" PRIu64 "\n\tend = %" PRIu64
"\ndifference = %" PRIu64 "\n", start, end, end - start);
return 0;
err_exit:
perror(NULL);
printf(usage, DEFAULT_LOOPS);
return EXIT_FAILURE;
}

static inline uint64_t fx_rdtsc(void) {
asm("rdtsc");
return;
}

static inline uint_fast64_t hstoq(const char *restrict hs, int nch) {
uint_fast64_t total = 0, currch;
uint_fast8_t hexch;
int shift = -4;
#ifdef DBG
printf("string = %s\nfrom hs = %s\n", hs-nch, hs);
#endif
while (nch >= 0) {
if (*(hs+nch) >= 0x61) hexch = *(hs+nch) - 0x57;
else if(*(hs+nch) >= 0x41) hexch = *(hs+nch) - 0x37;
else hexch = *(hs+nch) - 0x30;
currch = hexch;
currch <<= (shift += 4);
total += currch;
#ifdef DBG
printf("nch = %d\t*hs = %c\thexch = %c\ncurrch = %"
PRIxFAST64 "\ttotal = %" PRIxFAST64 "\nshift = %d\n\n",
nch, *hs, hexch, currch, total, shift);
#endif
nch--;
}
return total;
}





.



Relevant Pages

  • RE: .net 2003 vs vc6 performance
    ... As far as some of the loops that you suspect may be CPU intensive you could ... write prototype apps in both .NET and C++ using those loops and try and ... provide any perf benefits. ... and .NET and compare. ...
    (microsoft.public.dotnet.framework.performance)
  • Re: circles
    ... of tongue-in-cheek self-rightousness. ... Alas. ... and compare the time of 100 loops of your code ...
    (comp.sys.apple2.programmer)
  • Numerics: Visual C++ vs. g++
    ... with the goal to see how the compiler optimizes the loops. ... make –f compare ... compiles and runs tests with GCC and Visual C++. ...
    (sci.math.num-analysis)
  • Re: anonymous functions
    ... Is it possible to compare each ... number of the list with the variable using anonymous functions instead ... of loops, etc...also can it be done with letters too? ... nourris au code source sans farine animale." ...
    (comp.lang.lisp)