Re: COMPARE HLL/ASM



Wolfgang Kern wrote:


santosh suggested (in reply to Wannabee):
...

Tell you what, if you are up to it, lets do a routine to convert an
hexadecimal string (without '0x' prefix or 'h' suffix) into the
corresponding 64 bit numerical value. You do yours in assembler and
I'll do mine in C and lets compare the relative speed for a few
million passes over the test string. We can use C's clock() function
or RDTSC, whichever you prefer. Respond if you are up to it and I'll
post my version's source and test run on my machine.

I think we've done something similar a few years ago.
But I like the idea so let's start with it:
If you don't mind, I specify in more detail:

* string is zero-padded fix-sized, no leading spaces ,no end-mark and
UPPERCASE ie: "FEDCBA98"
* no error indication if any non hex-characters are found.
* just for compatibilty with oldies: no SSE-instructions allowed.

Okay. But for my C code the above conditions are already too late
because I finished coding it before I saw this post. I accept variable
length strings upto ffffffffffffffff with optional leading zero
padding. Both upper and lower case are accepted.

Invalid or malformed strings are rejected, but not by the actual
conversion routine, but when program starts up. This takes place before
the conversion loop is entered, so the time spent is irrelevant.

I used no optimisations for compiling.

* there are no C-functions available on most ASM tools,
so RDTSC seems to be the only choice.

Okay. I used RDTSC as well.

* running it several million times will just measure back-ground
noise from the OS, and STI/CLI may not work on every OS.
I'd recommend to run it three to four times and report the smallest
time interval.

Okay.

So here is my effort. The invocation and output are shown for 3 runs of
1 million loops each.

$ ./hstoq_icc aabbccddeeff0011
input: aabbccddeeff0011 passes: 1000000
rdtsc:
start = 40949977738136
end = 40950155398528
difference = 177660392

$ ./hstoq_icc aabbccddeeff0011
input: aabbccddeeff0011 passes: 1000000
rdtsc:
start = 40955083892880
end = 40955285678376
difference = 201785496

$ ./hstoq_icc aabbccddeeff0011
input: aabbccddeeff0011 passes: 1000000
rdtsc:
start = 40956923434320
end = 40957101562616
difference = 178128296

I don't how many seconds those measurements come up to. Here is the same
command timed by the UNIX 'time' utility.

$ time ./hstoq_icc aabbccddeeff0011
input: aabbccddeeff0011 passes: 1000000
rdtsc:
start = 41221246973984
end = 41221414368856
difference = 167394872

real 0m0.107s
user 0m0.108s
sys 0m0.000s

I have shown only one run to preserve space.

As the output shows, the code takes about one-tenth of a second for a
million passes over the test string. Obviously, larger test strings
(and greater loops) take more time than smaller ones.

The complete code follows. It compiles with the GNU C compiler or the
Intel C compiler. The compilation command is:

For gcc:
gcc -Wall -W -std=gnu99 hstoq.c

For Intel C:
icc -Wall -W -std=c99 hstoq.c

Usage is: PROGRAM HEXSTRING [ITERATIONS]

PROGRAM is executable file name.
HEXSTRING is the string to be converted.
ITERATIONS is the number of times to loop. This is optional. If left out
the default is 1,000,000 loops.

Code:
=====

#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <errno.h>
/* #define DBG 1 */ /* uncomment for debug output */
#define DEFAULT_LOOPS 1000000L

static inline uint_fast64_t hstoq(const char *restrict, int);
static inline uint64_t fx_rdtsc(void);

int main(int argc, char **argv) {
char *hexstr;
int nch;
uint_fast64_t compare, val;
uint64_t start, end;
register long loops = DEFAULT_LOOPS;
const char const *usage = "Usage: a.out STR [n]\nSTR - hex string;"
"must _not_ have prefix or suffix\nn - no. of times to convert."
" default is %ld\n\n";

if (argc < 2) goto err_exit;
else {
hexstr = argv[1];
nch = strlen(argv[1])-1;
#ifdef DBG
printf("argv[1] = %p\thexstr = %p\tnch = %d\nstrlen(argv[1])
= %d\n",
(void*)argv[1], (void*)hexstr, nch, strlen(argv[1]));
#endif
for (int ctr = nch; ctr >= 0; ctr--)
if (!isxdigit((int)argv[1][ctr])) goto err_exit;
if (argc > 2) {
errno = 0;
loops = strtol(argv[2], NULL, 0);
if (errno == ERANGE || loops <= 0) goto err_exit;
}
}
errno = 0;
compare = (uint_fast64_t)strtoull(argv[1], NULL, 16);
if (errno == ERANGE) goto err_exit;
printf("input: %s\tpasses: %ld\n", argv[1], loops);
start = fx_rdtsc();
while (loops >= 0) { val = hstoq(hexstr, nch); loops--; }
end = fx_rdtsc();
if (val != compare) {
printf("MISMATCH!\n\tstrtoull = %" PRIxFAST64
"\n\thstoq = %" PRIxFAST64 "\n", compare, val);
}
printf("rdtsc:\n\tstart = %" PRIu64 "\n\tend = %" PRIu64
"\ndifference = %" PRIu64 "\n", start, end, end - start);
return 0;
err_exit:
perror(NULL);
printf(usage, DEFAULT_LOOPS);
return EXIT_FAILURE;
}

static inline uint64_t fx_rdtsc(void) {
asm("rdtsc");
return;
}

static inline uint_fast64_t hstoq(const char *restrict hs, int nch) {
uint_fast64_t total = 0, currch;
uint_fast8_t hexch;
int shift = -4;
#ifdef DBG
printf("string = %s\nfrom hs = %s\n", hs-nch, hs);
#endif
while (nch >= 0) {
if (*(hs+nch) >= 0x61) hexch = *(hs+nch) - 0x57;
else if(*(hs+nch) >= 0x41) hexch = *(hs+nch) - 0x37;
else hexch = *(hs+nch) - 0x30;
currch = hexch;
currch <<= (shift += 4);
total += currch;
#ifdef DBG
printf("nch = %d\t*hs = %c\thexch = %c\ncurrch = %"
PRIxFAST64 "\ttotal = %" PRIxFAST64 "\nshift = %d\n\n",
nch, *hs, hexch, currch, total, shift);
#endif
nch--;
}
return total;
}

.



Relevant Pages

  • Re: String Memory Recovery
    ... Having two methods, Foo1(ByVal x as String), and Foo2. ... So timing calling these methods in large loops, the larger the string being passed in, the larger the difference between ByRef and ByVal. ... In the first test case I used local variable, later I moved that variable up to a form field level to ensure the compiler could not safely optimize it out. ... There was no difference between the Const variable and literal string in terms of results to each other, but compared to a string variable, they were significantly slower with the ByRef calls. ...
    (microsoft.public.vb.general.discussion)
  • Re: Letter to US Sen. Byron Dorgan re unpaid overtime
    ... Big-O notation isn't mathematics per se, it's computer science notation ... "length of the string". ... outrun something compiled and optimized by a good C compiler. ... > either general computing culture or culture outside computing. ...
    (comp.programming)
  • Re: [EGN] Hoisting Loop Invariants (Was: Re: [EGN] Numerical Accuracy)
    ... compiler out there somewhere that did as you claim. ... > the programmer has this knowledge, then the programmer should not use ... >> string in a loop, regardless of the blatant inefficiency of doing so. ...
    (comp.programming)
  • Re: How to convert Infix notation to postfix notation
    ... and make all strings const save where the intent ... function whose contract is to change the string. ... the compiler "just" prevents the string ... try to do using the pointer you get. ...
    (comp.lang.c)
  • Re: PL/I string representations
    ... >> of the language, so it was interesting to me, hopefully it will be to ... I found a workable compiler for Fortran in 1971 (with a bug ... >> The specified length is the minimum, and each time a character ... >> string is assigned to E, the length is stored with it. ...
    (comp.programming)