Re: What's wrong with strcpy/strcat in this "C for Tcl" experiment?

From: Robert Heller (heller_at_deepsoft.com)
Date: 10/09/04


Date: Sat, 09 Oct 2004 01:46:12 +0200


  Erik Leunissen <look@the.footer.invalid>,
  In a message on Fri, 08 Oct 2004 22:53:59 +0200, wrote :

EL> Helmut Giese wrote:
EL>
EL> >
EL> > because you're lucky - or rather, because you're unlucky enough that
EL> > it doesn't fail _obviously_, too.
EL> >
EL>
EL> [snipped some]
EL>
EL> >
EL> > Lemme sum it up this way: The fact that a program does not segfault is
EL> > in general not sufficient evidence for its correctness :)
EL>
EL>
EL> Yes, I understand this. Thank you.
EL>
EL> I conclude from this that whenever declaring a pointer, I should
EL> allocate memory before writing to it. Correct?
EL>
EL> I've got this learning text right before me, which says that the
EL> following is allowed:
EL>
EL> char * p
EL> p = "Whatever"
EL>
EL> This is confusing. It seems to me that this code is also writing to
EL> undefined space.

No, it is not. strcpy() is not the same as pointer assignment.

This is what is happening:

Somewhere in read-only memory (either program text or a special segment
set assign for constant object (like strings) there are 9 bytes of
memory containing the 9 characters:

        'W', 'h', 'a', 't', 'e', 'v', 'e', 'r', '\0'

The compiler has also allocated in variable space there is a pointer
object named 'p'. In a standard x86 type machine (your standard PC),
this is four bytes (a long word). Initially it contains some random
garbage.

When you execute the statement:

         p = "Whatever";

the long word (pointer) named 'p' gets set to the address of the
read-only memory containing the 9 bytes shown above. The string
constant "Whatever" is NOT copied to anywhere. No memory is allocated.
p's previous value is discarded.

When you do:

        strcpy(p,"Whatever");

Something *completely* different happens. The constant string
"Whatever" is *copied* to memory, starting at the location *pointed to*
by p. This had better be a location you can write to, otherwise, you
will get a segfault.

A simplistic version of strcpy might look like:

char *strcpy(char *dest, const char *src)
{
        char *ptr;
        for (ptr = dest;*src != '\0';ptr++) *ptr = *src++;
        *ptr = '\0';
        return dest;
}

Now you can emulate this function with a batch of alphabet blocks and
some boxes. Think of the various pointers as pointing to spaces where
things can be (memory locations). You start out with some pointers (a
piece of paper with an arrow on it and a name: 'p'. 'strcpy:dest',
'strcpy:src', and 'strcpy:ptr'). Toss the arrow named 'p' to some
random location and then point the arrow named 'strcpy:dest' to
whatever 'p' (randomly points to, which can be anywhere -- at one of
your boxes, the floor, your belly button, etc.). Point 'strcpy:src' to
the stack of blocks spelling "Whatever" (don't forget a blank block at
the end!). Put the 'strcpy:ptr' next to the strcpy:dest Look at the
block 'strcpy:src' points to. If it is not the NUL (blank) block copy
it to where 'strcpy:dest' points to. Oops! That is your belly button,
not a box! You can't put a block there. Yell 'Segmentation Fault'!

Calling malloc(), is like getting a box off the shelf. Now you have a
place to put your blocks. Start out with 'p' / 'strcpy:dest' pointing
at this box. Be sure get get a big enough box. If the box is too
small you will have a problem getting all of the blocks into it.

Pointer assignment is easy, just move the pointer to wherever you need
to point it to. The blocks don't move, just the pointer. Note: 'ptr++'
is shorthand for 'ptr = ptr + 1', which moves the *pointer* ptr to the
next memory location (next block). The expression *ptr = *src, *copies*
the block at the location *pointed to* by src to the location *pointed
to* by ptr -- the *contents* of a memory location is copied to another
memory location.

Note: plain C does not have 'strings' in the same sense as Tcl (and
other 'high level' languages. It has arrays of char and pointers to
chars. In C pointers and arrays are often interchangeable -- esp. when
you assign an *array* to a pointer -- when this happens that compiler
takes it to mean the address (pointer) of the first element of the
array:

        char temp[10], *p;

        p = temp; /* this is short for 'p = &temp[0];' -- this is a
                     pointer assignment operation. */

        p = "Whatever";
        /* this is short for char anonymous[] = "Whatever";p=&anonymous[0]; */

Pointers can be confusing at first, until you look at them in some
'concrete' fashion.

(You don't really have to raid some small child's block collection to
perform the above 'experiment'. You can use a pencil and a piece of
paper to do the same thing.)

EL>
EL>
EL> Greetings,
EL>
EL> Erik
EL> --
EL> leunissen@ nl | Merge the left part of these two lines into one,
EL> e. hccnet. | respecting a character's position in a line.
EL>
EL>

                                     \/
Robert Heller ||InterNet: heller@cs.umass.edu
http://vis-www.cs.umass.edu/~heller || heller@deepsoft.com
http://www.deepsoft.com /\FidoNet: 1:321/153

               


Quantcast