Re: Degenerate strcmp
- From: "J. J. Farrell" <jjf@xxxxxxxxxx>
- Date: Sat, 18 Aug 2007 03:59:36 -0700
On Aug 18, 10:19 am, Antoninus Twink <spam...@xxxxxxxxxxx> wrote:
On 18 Aug 2007 at 1:40, Eric Sosman wrote:
fishp...@xxxxxxxxxxx wrote:
One way I've seen strcmp(char *s1, char *s2) implemented is: return
immediately if s1==s2 (equality of pointers); otherwise do the usual
thing of searching through the memory at s1 and s2.
Of course the reason for doing this is to save time in case equal
pointers are passed to strcmp. But it seems to me that this could create
an inconsistency in the degenerate case when s1 points to memory that is
not null-terminated, i.e. by some freak chance, all of the memory from
s1 till the computer reaches the end of all its memory pages (however
that works) don't contain a single null byte. In this case, strcmp
should not say that s1 and s2 are "equal strings" since neither is
actually a string (because not null terminated).
Is my thinking correct?
What you seem to have missed is that there is no "correct"
behavior in the case you describe: The behavior is undefined
because the arguments are not strings. Returning zero is one
possible behavior, a SIGSEGV is another, a graphic of a nasal
demon whistling "Dixie" while riding backwards on a bicycle
is yet another.
I think the subtle point is the following: a char * isn't actually the
same thing as a string. A char * is a pointer to some bytes of memory,
but is s is a char * then for s to be a string, we need the sequence
*s, *(s+1), *(s+2), ..., *(s+i), ... to actually contain a 0 byte for
some i. In practice memory will have 0 bytes all over the place, but
there's still a theoretical possibility that there won't be zero byte
for any i until the memory space is completely exhausted.
What is "subtle" about this? It's just the definition of a string, and
it's very simple.
Maybe the program I put in the other thread
main() { printf("%d\n",strlen(malloc(0))); }
illustrates this more simply than strcmp: malloc(0) returns a pointer to
some random place in memory, and there's no absolute guarantee that a
0-byte will occur later in memory, so what gets printed could be a
random number or in theory the program could just never terminate.
I don't understand your point. You seem to be working hard to tell us
that a string is an array of chars up to and including the first zero-
valued character. Of course it is, since that's what it's defined to
be. If there is no zero-valued character in the array, then the array
doesn't contain a string.
Part of the confusion seems to be the names: for example, strlen takes a
char * and returns an int. If the parameter is a string, then the
integer is the length of the string and that makes perfect sense. But
what strlen actually takes is a general char *, not necessarily a
string, and if you pass strlen a char * that isn't a string then you
need to think more carefully about how to interpret the return value of
strlen (or strlen might not terminate at all).
You don't need to be careful about anything if you do this, since you
cannot predict how the system will behave. What you need to be careful
about is not doing this in the first place. strlen() takes a string;
it is your responsibility to ensure that you only ever give it a
string; if you give it anything else, there's no saying what might
happen.
.
- References:
- Degenerate strcmp
- From: fishpond
- Re: Degenerate strcmp
- From: Eric Sosman
- Re: Degenerate strcmp
- From: Antoninus Twink
- Degenerate strcmp
- Prev by Date: Re: How to implement a Hash Table in C
- Next by Date: Re: Degenerate strcmp
- Previous by thread: Re: Degenerate strcmp
- Next by thread: Re: Degenerate strcmp
- Index(es):
Relevant Pages
|