Re: opensolaris C library vs. GNU C Library



jacob navia wrote:
This means that both algorithms converge with large strings, what is not
surprising since input/output from/to main memory becomes the dominant
factor of the computation, and both algorithms are then the same.

This will probably not be true until you start swapping to disk.
Memory is still O(n), but is mitigated by bandwidth capabilities of
your hardware while the O(n) algorithm of the strstr() itself is still
dependent on the efficiency of the algorithm.

With shorter strings GNU is twice as fast, but since the strings are
shorter the absolute difference is less (only 15ms after 10 thousand
runs!) so it is probably not worth the effort of understanding
the GNU version, full of goto's and a very complex control flow.

Because you're going to get so much more understanding out of "you
can't shift a negative number on some platforms". It *IS* worth a
minute or two of study, because the author didn't comment it, yet it
achieves better performance. If you figure it out, you may be able to
achieve that performance without the non-maintainability penality. You
instead prefer to stay in the dark for lack of wanting to look at a 100
lines of code?

Surely I would not love being the maintainer of that stuff.

In the case of strstr(), or other well defined library functions, its
actually quite easy to maintain it, regardless of how convoluted it is.
Just write tests for it.

The size of the GNU function is 260 bytes, the Sun version is
only 110 bytes.

The GNU version would be important when strstr makes more than 90%
of the run time of your program. For all other situations Sun's version
is the same

Yes, but *understanding* the code (for its performance) is the most
important alternative of all. Doing so allows you do this:

char * QEDstrstr (const char * d0, const char * d1) {
int i, j;
char c0, c1;

/* peel off case: strstr (*, "") */
if ('\0' == (c0 = d1[0])) return (char *) d0;

/* peel off case: strstr (*, one-character-string) */
if ('\0' == (c1 = d1[1])) {
char * d = (char *) d0;
for (; c0 != *d; d++) {
if ('\0' == *d) return NULL;
}
return d;
}

for (i = 0;; i++) {

/* Unrolled (once) first character test */
loop0:;
if (c0 != d0[i]) {
if ('\0' == d0[i]) return NULL;
if (c0 != d0[i+1]) {
if ('\0' == d0[i+1]) return NULL;
i+=2;
goto loop0;
}
i++;
}

/* Second character test */
if (c1 != d0[i+1]) continue;

/* Check beyond first two characters */
for (j = 2; d1[j] == d0[i+j]; j++) {
if ('\0' == d1[j]) return (char *) (d0 + i);
}
if ('\0' == d1[j]) return (char *) (d0 + i);
}
}

This achieves very close to the GNUstrstr performance (from my own
testing, on multiple compilers), while being somewhat more readable.
The trick, of course, was to unroll (the core reason why the GNU
version is faster), and peel off one case (which is just a standard
thing I do, as necessary). Its commented, and therefore even more
maintainable than the Sun version.

Now, would you say, this sort of optimization was not worth it, because
I cannot possibly optimize for multiple platforms/compilers at once
(even though I did just that)? That the micro-technique (unrolling) is
not worth it because you're hoping that your compiler will do it for
you (only one I tried succeeded in doing that)?

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

.



Relevant Pages

  • Re: Turing Machines and Physical Computation
    ... rather than adumbrated in that 1936 Turing paper On Computable ... ... any actual computer has finite memory, ... computer will run out of memory for some huge calculation. ... It is generally assumed that an algorithm must satisfy the following ...
    (comp.theory)
  • Re: Turing Machines and Physical Computation
    ... rather than adumbrated in that 1936 Turing paper On Computable ... ... any actual computer has finite memory, ... computer will run out of memory for some huge calculation. ... It is generally assumed that an algorithm must satisfy the following ...
    (sci.math)
  • [Meta] for the record - please ignore (was Re: how to transpose large matrix?)
    ... Isn't it a bit pointless to include that attribution line then? ... asked for a solution that would save memory, ... going to compare an Oalgorithm to an Oalgorithm. ... computational complexity of storing the entire matrix in memory. ...
    (comp.unix.shell)
  • Solving the memory issue of lock-free algorithms
    ... I read a lot about lock-free algorithm and tried to find an "elegant" ... solution to the memory problem that the algorithms poses. ... LFStackNode *Next; ... xor eax, eax ...
    (comp.programming.threads)
  • Re: Version after Version
    ... MMX, by doing 50 times two additions at a time. ... Many compilers & memory managers have at least the option to align data on ... > - compression and decompression works at the Bit level ... > LZW is the until recently proprietory algorithm of Uinisys ...
    (alt.comp.lang.borland-delphi)