Re: string comparison



yadurajj@xxxxxxxxx wrote:
> Hello i am newbie trying to learn C..I need to know about string
> comparisons in C, without using a library function,...recently I was
> asked this in an interview..I can write a small program but I was told
> that wouldn't it be wise to first get the length of the strings..if it
> doesn't match then they are not the same..I agreed...then he said..but
> again that would be an overhead first measuring the length...

It is *highly* unlikely that this comment was carried to its logical
conclusion. If he's talking about "performance", then there is a lot
more to this question than this.

> [...] and then
> doing a character by character comparison...I was confused and was
> wondering if anybody has an answer to this theory. I am only trying to
> undestand...

Ok, first of all, if the lengths are intrinsically available to you in
the first place (in most situations, when you are in control of all the
code and data formats, this is generally trivial to guarantee), then
sure, you can and perhaps should first compare the lengths, afterwhich
you want to perform the equivalent of a memcmp().

If you don't have the length, and your strings are just '\0' terminated
char * strings, then performing strlen()'s first and precomparing will
never be better in terms of performance. The reason is that the whole
cost of string comparison is the loop overhead, and memory traversal.
By calling strlen twice, you are basically (roughly) tripling the loop
overhead, and doubling the memory traversal cost.

In terms of amount of work done, the moment you have enough information
to know the two lengths are different, is the same moment you can know
that the substance of the two strings are different.

Ok, so here are some remaining questions:

1) How should one implement a standard strcmp? -- as I recall, this is
like 3 lines of code.

2) How might one implement a standard memcmp? (On the assumption that
lengths are always available to you, since it basically costs nothing
for this to be the case.) Doing this naively is easy; but trying to
take advantage of low-level hardware characteristics, such as
alignment, this can be hard if you are really intent on squeezing out
the maximum performance.

3) Is this string comparison isolated? For example, let us say that
you are performing a string comparison for the purpose of inserting
into a hash table to avoid inserting duplicates. Well in this case a
scalar "trace" of the string is precomputed anyways in the form of its
hash function mapping. So if you are store the hash values along with
each string then doing this pre-compare (like the length pre-compare,
except you should expect this to have far fewer false positives) will
improve average comparison performance.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

.



Relevant Pages

  • Re: Need quick lookup like Hashtable, but dont need to store value
    ... The cost of the individual operations. ... that container will require touching all 100 characters of the target string ... in order to compute its hash. ... For a sorted array, finding a string using a binary search will take at most ...
    (microsoft.public.dotnet.framework)
  • Re: C Slower than Python?
    ... seconds without the loop overhead. ... and by pretending that string lengths are available. ... Try adding two extra variables to store ...
    (comp.lang.c)
  • Re: Hey Karl...
    ... likely to see if the string lengths are the same. ... I would be amazed if the comparison code didn't compare lengths first. ... But you'd really need to be crunching the parsing for any such optimisation to ever be of any real significance. ...
    (microsoft.public.vb.general.discussion)
  • Re: 630mm guitar
    ... It's possible to use a 'normal' sized body with the shorter scale ... The lower string tension allows you to make the top and bracing just ... The biggest problems with shorter scale lengths have to do with the ...
    (rec.music.classical.guitar)
  • Re: write checks challenge
    ... I don't recall all the details, and frankly it is not worth my ... lengths run-time variable. ... much nicer if we didn't have to have the explicit length specification. ... If you special-case it to string ...
    (comp.lang.fortran)