Re: Bounds checked arrays

From: Keith Thompson (kst-u_at_mib.org)
Date: 02/16/04


Date: Mon, 16 Feb 2004 01:36:10 GMT


"jacob navia" <jacob@jacob.remcomp.fr> writes:
[...]
> It would be useful then, if we introduced into C
>
> #pragma STDC bounds_checking(ON/OFF)
>
> When the state of this toggle is ON, the compiler
> would accept declarations (like now)
>
> int array[2][3];
>
> The compiler would emit code that tests
> each index for a well formed index.
> Each index runs from zero to n-1, i.e.
> must be greater than zero and less than
> "n".
[...]

To be useful, such bounds checking cannot be limited to explicitly
declared arrays. You've mentioned requiring explicit bounds on
function arguments, but that's only one special case. For example:

    int arr[10];
    int *ptr = arr;
    ...
    int x = ptr[15];

The declaration of arr and ptr, the initialization of ptr, and the
evaluation of ptr[15] can be almost arbitrarily complex and can be
widely separated, even in separate source files. If you want to do
reliable bounds checking on ptr[15], you have to carry the bounds
information along with the pointer.

In other words, you need fat pointers.

The type of a pointer is known at compilation time, so you don't need
to store, for example, the fact that ptr points to, say, a 4-byte
type. You do need to store the lower and upper bounds of the object
into which it points. For example (let's call the raw address
returned by malloc() M):

    char *ptr = malloc(100); /* ptr contains (M, 0, 100) */
    ptr += 20; /* ptr contains (M+20, -20, 80) */
    char c1 = ptr[-10]; /* ok, -10 is within the bounds */
    char c2 = ptr[50]; /* ok, 50 is within the bounds */
    char c3 = ptr[80]; /* failure, 80 is just beyond the upper bound */

Every pointer arithmetic operation has to update the bounds
information as well as the address, and can fail if the resulting
address points outside the original object. Every pointer dereference
operation has to check the bounds. The bounds information could be
stored either as byte offsets or as a count of the pointed-to type; I
have a hunch byte offsets would work better, but I'm not certain.

Finally, you still need to have a way to represent a pointer with no
associated bounds information, on which no checking can be done.
Unless you're designing a whole new system from scratch, some code in
C-with-bounds-checking ("C[]"?) will still have to deal with pointers
from system interfaces and from code written in C-without-bounds-checking.

Perhaps, rather than a "#pragma", bounds-checked pointers should be
specified with a new type qualifier.

Note that the presence of absence of bounds-checking, however it's
specified, will change the sizes of pointer objects, which will affect
memory layouts, which will inevitably reveal or hide subtle bugs.
There will be programs that work correctly (and don't trigger any
checks) with bounds-checking enabled, but will fail mysteriously when
it's disabled.

Another possibility might be to store the bounds information
elsewhere, rather than in each pointer object. The system could keep
a table of all known objects, including base address and size, that's
updated whenever an object is created or destroyed. But then each
pointer operation would have to do a (hash?) lookup on the table,
which would slow things down even more.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
San Diego Supercomputer Center             <*>  <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"