Re: "Mastering C Pointers"....

From: Roose (nospam_at_nospam.nospam)
Date: 11/03/03


Date: Mon, 03 Nov 2003 05:50:46 GMT

Let me preface this with some meta-comments. If your goal is to learn C, by
all means go ahead and dive right into the C language. You only learn by
making mistakes. But is a common fact in computers that you really only
master something when you have at least learned the level below it, i.e.
_what it is abstracting_. Then you see where the abstraction came from.

See these articles:

http://www.joelonsoftware.com/articles/LeakyAbstractions.html
http://biztech.ericsink.com/Abstraction_Pile.html

So if you configure OSes, it's good to learn some scripting. If you script,
good to know some compiled programming. If you program C, good to know some
assembly/computer architecture.

The idea was to give a simple concrete model that anyone can learn, but
don't get bogged down if your real goal is to learn C. Basically what I'm
describing is a von Neumann machine with some details (this is a very
influential hardware architecture that created the descendants of PCs,
roughly). Unfortunately with a cursory google search, I couldn't find any
good links for that, maybe someone else can help out here.

> > Basically you have a CPU which has a clock say every microsecond if
you're
> > on a gighertz machine.
>
> Wouldn't that be "nanosecond"?

Yes, whoops.

> > 1. load them both from memory to CPU registers
> > 2. execute the add instruction and specify those two registers
> > 3. store the result back to memory
> >
>
> So memory is just dumb storage. I didn't realize that.

Yes, you can think of it as inert. The CPU controls everything, the
"brain". Memory is a separate unit which just stores bits. Same with the
hard disk. Notice that as the access time increases, so does the size of
the medium.

> > Memory is just a big array of bytes. If you have 1 gig of memory,
pretend
> > it is addressed 0 .. 0x3FFFFFFF. This is 2^30-1, or 1 gig. 0xFFFFFFFF
is
> > 2^32-1, which means 4 gigs. If you have heard Apple's stupid
advertisements
> > that you can have more than 4 gigs of RAM in their PCs, that is because
they
> > use more than 32 bits for pointers (i.e. the G5's 64-bit).
>
> Don't get THAT at all, but that's okay for now.

The main point here was that a pointer is a C language abstraction (for an
address in memory). A pointer at the hardware level _is an integer_.
Memory is just a big array of bytes again. Say you have 1 meg of memory,
then you can just number the bytes 0 .. 1,048,575 (2^20-1). Suppose you
have a variable numFiles that is the number of files in a directory = 55.
That variable must be stored somewhere. Say it is stored at the 200,005th
byte. Then a pointer to numFiles would have the numeric value 200,004 (get
used to indexing from 0 in C). That's it. Nothing else. It's like an
array index if you think of the array as all of memory.

e.g.
int numFiles = 55;
int* pNumFiles = &numFiles; // pNumFiles is a variable of pointer
type, with the numeric value 200,004

Of course, as always, there are details, but I'm not going to clutter up
that simple concept with them. And this is very real, you can inspect
pointers in a debugger and see this (which was my suggestion).

Say you have a 32 bit register. After you learn binary/hex, you will see
why the number of different integers you can store in 32-bits is 2^32 = 4
gigs. A pointer can be thought of as an integer index, like I said. Thus
if a pointer is 32-bits, then you can only address 4 gigs of memory.

> In your previous post you recommended that I learn hexadecimal (which I
> have a slight handle on: 0-9-F with one character to describe identify
> a nibble.
>
> By this you mean learning to convert it to decimal, right?

You don't actually NEED to in order to do the exercise I suggested, since
many debuggers will display all values in decimal, but it helps to get into
the mindset. It is pretty simple if you are even mildly mathematical. Hex
is really a shorthand for binary, in some sense. There are probably plenty
of tutorials on the web, but afterwards you should be able to understand
these common equivalences:

2^8-1 = 255 = 0xFF = 1111 1111
2^8 = 256 = 0x100 = 1 0000 0000 = 0.25 K
2^16-1 = 65535 = 0xFFFF = 1111 1111 1111 1111 (spaces in binary added for
clarity)
2^16 = 65536 = 0x10000 = 1 0000 0000 0000 0000 = 64 K

0x1 = 0001 b
0x2 = 0010 b
0x4 = 0100 b
0x8 = 1000 b

0x1 = 0001 b
0x3 = 0011 b
0x7 = 0111 b
0xF = 1111 b

> > Also, you know that if's and goto's can be substituted for all for and
while
> > loops.
>
> I do?

Yes, try taking a simple for loop or while loop and doing the exact same
thing with if's and goto's, if you don't see it right away. Of course this
is very bad programming practice, since loops make your logic more much
clearer to the reader.

(If you're not totally familiar with loops in general, you might want to try
a higher-level language like Python/Java/C# before attempting C.)

> Sounds like the b command in sed. (oops! The b command in sed must be
> like the jump instruction in C :-)

Well "jump" is an assembly term for specific CPU instructions. Goto is the
equilavent in C. My point was that for and while loops compile down to
jumps and tests (instruction that return true or false, basically).

> Singe the hair off their balls: I'm getting a lot out of this.
>
> It's really hard to learn the basics of anything if you get too bogged
down
> in exactness.

Good, I'm glad. This confirms an observation developed from some years of
teaching. Exactness isn't necessarily the problem, but you just have to
know WHAT you need to be exact about (i.e. not irrelevant details like
rarely used terminology).

Let me close with some more high level observations. What does C add on top
of this model (besides nice syntax, I'm talking ideas here)?

1) Platform independence -- instead of writing in native assembly, you write
in C, and then compilers for every platform translate your C program into a
stream of instructions (just byte data)
2) Constructs like functions, for and while loops, if's and switches, to
organize this enormous stream of instructions
3) A strong type system -- this can be confusing at first
    This is why I wanted to emphasize that pointers are just integers in
hardware. They're a C language construct. Same thing with characters/
character strings. Now floats ARE actually different in hardware, but we
won't get to that.
4) some other stuff which I don't care to think up now : )

The point of the type system is to catch mistakes at an obvious level.
Compiling a C program to see if all the types match up can catch a lot of
mistakes. It's sort of a sanity check, and it helps you structure your
program.

But again, you can read all you want, but if you can pull off the exercise
with the debugger I suggested, you will learn a whole lot.

Roose



Relevant Pages

  • Re: Garbage collection
    ... For example, it's perfectly legal to take a pointer object, break its ... memory that refers to the referenced block of memory, ... They manage their memory themselves *because* they care about it. ... other language feature you care to name: ...
    (comp.lang.c)
  • Re: Nested function scope problem
    ... term "binding", and that whatever it means> (I'll have to read more on that, ... (like a pointer to an object's address). ... In a language like C the name doesn't hold anything either. ... The name is a symbolic name for a memory address in which bits will be ...
    (comp.lang.python)
  • Re: size of pointers
    ... so what is the clue in "standard C" about pointers? ... What is it a pointer for the C language? ... that contain an address (memory or register that have enought bit to ...
    (comp.lang.c)
  • Re: Bug/Gross InEfficiency in HeathFields fgetline program
    ... One of the key goals of the C standard is to define the language sufficiently loosely that it can be efficiently implemented on just about any hardware. ... I've used machines where 16 bits was the most reasonable size for 'int', which had a LOT more than 65536 bytes of memory installed. ... In Java, efficiency of implementation, is sacrificed, if need be, to make it easier to write portable code. ... and then derive meaningful information about that pointer by examining ...
    (comp.lang.c)
  • Re: Is this math test too easy?
    ... > communications glitch; one of the more laughable cartoons ... it was loaded into physical memory and, ... > Or one can interpret the character string as one of the values ... A pointer to an integer? ...
    (sci.math)

Loading