Re: Degenerate strcmp



On Aug 18, 10:19 am, Antoninus Twink <spam...@xxxxxxxxxxx> wrote:
On 18 Aug 2007 at 1:40, Eric Sosman wrote:
fishp...@xxxxxxxxxxx wrote:
One way I've seen strcmp(char *s1, char *s2) implemented is: return
immediately if s1==s2 (equality of pointers); otherwise do the usual
thing of searching through the memory at s1 and s2.

Of course the reason for doing this is to save time in case equal
pointers are passed to strcmp. But it seems to me that this could create
an inconsistency in the degenerate case when s1 points to memory that is
not null-terminated, i.e. by some freak chance, all of the memory from
s1 till the computer reaches the end of all its memory pages (however
that works) don't contain a single null byte. In this case, strcmp
should not say that s1 and s2 are "equal strings" since neither is
actually a string (because not null terminated).

Is my thinking correct?

What you seem to have missed is that there is no "correct"
behavior in the case you describe: The behavior is undefined
because the arguments are not strings. Returning zero is one
possible behavior, a SIGSEGV is another, a graphic of a nasal
demon whistling "Dixie" while riding backwards on a bicycle
is yet another.

I think the subtle point is the following: a char * isn't actually the
same thing as a string. A char * is a pointer to some bytes of memory,
but is s is a char * then for s to be a string, we need the sequence
*s, *(s+1), *(s+2), ..., *(s+i), ... to actually contain a 0 byte for
some i. In practice memory will have 0 bytes all over the place, but
there's still a theoretical possibility that there won't be zero byte
for any i until the memory space is completely exhausted.

What is "subtle" about this? It's just the definition of a string, and
it's very simple.

Maybe the program I put in the other thread

main() { printf("%d\n",strlen(malloc(0))); }

illustrates this more simply than strcmp: malloc(0) returns a pointer to
some random place in memory, and there's no absolute guarantee that a
0-byte will occur later in memory, so what gets printed could be a
random number or in theory the program could just never terminate.

I don't understand your point. You seem to be working hard to tell us
that a string is an array of chars up to and including the first zero-
valued character. Of course it is, since that's what it's defined to
be. If there is no zero-valued character in the array, then the array
doesn't contain a string.

Part of the confusion seems to be the names: for example, strlen takes a
char * and returns an int. If the parameter is a string, then the
integer is the length of the string and that makes perfect sense. But
what strlen actually takes is a general char *, not necessarily a
string, and if you pass strlen a char * that isn't a string then you
need to think more carefully about how to interpret the return value of
strlen (or strlen might not terminate at all).

You don't need to be careful about anything if you do this, since you
cannot predict how the system will behave. What you need to be careful
about is not doing this in the first place. strlen() takes a string;
it is your responsibility to ensure that you only ever give it a
string; if you give it anything else, there's no saying what might
happen.

.



Relevant Pages

  • Re: Segmentation fault
    ... Here you ask for a pointer to char. ... to a random position in memory. ... There's nothing else than a string the user could enter;-) ... to the use of scanf(). ...
    (comp.lang.c)
  • Re: Is this string input function safe?
    ... return a pointer to mallocated memory holding one input string, ... complains about use of deallocated pointers, ... mallocating an appropriate amount of memory. ... the contents of the buffer are indeterminate (for different ...
    (comp.lang.c)
  • Re: structures, structures and more structures (questions about nested structures)
    ... char label; ... How can sizeof know how much memory you have allocated for the char*?. ... The number returned by sizeof is independent of the size of the string that has been allocated for the member subject - so I think sizeof can be used for a simple structure with basic data types - whenever you have nested pointers, ...
    (comp.lang.c)
  • Re: get text from listbox
    ... The first character is the low byte of the low word of the DWORD item data ... addressing memory in a virtual address space of 2 GB (memory allocation ... To prove that these are pointers get the item data of the list items as ... string pointer; check it by reading some bytes from the memory to which the ...
    (microsoft.public.vb.winapi)
  • Re: list optimization
    ... > As far as my first goal is to save as much memory as possible i prefer ... you may simple use a compromise of using an array and pointers between ... struct address *pNext ... string that holds the name, different names but same address 1 or 2 ...
    (comp.lang.c)