Re: Bounds checked arrays

From: Rob Thorpe (robert.thorpe_at_antenova.com)
Date: 02/17/04


Date: 16 Feb 2004 16:49:41 -0800


"jacob navia" <jacob@jacob.remcomp.fr> wrote in message news:<c0m3sa$96g$1@news-reader5.wanadoo.fr>...
> As everybody knows, the C language lacks
> a way of specifying bounds checked arrays.
>
> This situation is intolerable for people that know
> that errors are easy to do, and putting today's
> powerful microprocessor to do a few instructions
> more at each array access will not make any
> difference what speed is concerned.
>
> Not all C applications are real-time apps.
>
> Besides, there are the viruses
> and other malicious software that are using
> this problem in the C language to do their dirty
> work.
>
> Security means that we avoid the consequences
> of mistakes and expose them as soon as possible.
>
> It would be useful then, if we introduced into C
>
> #pragma STDC bounds_checking(ON/OFF)
>
> When the state of this toggle is ON, the compiler
> would accept declarations (like now)
>
> int array[2][3];
>
> The compiler would emit code that tests
> each index for a well formed index.
> Each index runs from zero to n-1, i.e.
> must be greater than zero and less than
> "n".
>
> In arrays of dimension "n", the compiler would
> emit code that tests "n" indices, before using
> them.
>
> Obviously, optimizations are possible, and
> good compilers will optimize away many tests
> specially in loops. This is left unspecified.
>
> Important is to know that the array updates
> can't overflow in neighboring memory areas.
>
> How many machine instructions does this cost?
>
> Each test is a comparison of an index with a
> constant value, and a conditional jump. If the
> compiler only emits forward branches, the
> branch predictor can correctly predict that in
> most cases the branch will NOT be taken.
>
> In abstract assembly this is 4 instructions:
> test if index >= 0
> jump if not "indexerror"
> test if index < "n"
> jump if not "indexerror"
>
> where "n" is a compile time constant.
>
> We have something like 4 cycles then, what
> a 2GHZ machine does in 0,000 000 004 seconds.
>
> Yes, table access is a common operation but
> it would take millions of those to slow the program
> a negligible quantity of time. We are not in the
> PDP-11 any more.
>
> This would make C a little bit easier to program,
> and the resulting programs of better quality.
> Buffer overflows happen of course, but the language
> limits the consequences by enforcing limits.
>
> By default the behavior is to stop the program.
> The user can override this, and different schemas
> can be specified by him/her to take actions when
> a buffer overflow happens.
>
> A simple strategy is to just do nothing.
>
> int fn(char *input)
> {
> char tmpbuf[BUFSIZ];
> int i=0;
> bool result = false;
>
> while (*input) {
> tmpbuf[i++] = *input++;
> }
> // Do things with the input
> // set result
> return result;
> indexerror:
> return false;
> }
>
> This function uses the built-in error checking
> to avoid any bad consequence for an overflow.
> If the input data is too long, it is a mal-formed
> input that should be discarded.
>
> This frees the programmer from the tedious task
> of writing
> if (i >= sizeof(tmpbuf)) goto indexerror;
>
> at EACH array access. This can be done better
> by a machine and the compiler.
>
> Because a program like that today
> ***assumes*** the input length
> can't be bigger than BUFSIZ.
>
> This is always *implicitely* assumed and
> nowhere *enforced* by the way. The current
> state implies that catastrophic errors can happen
> if the index starts overwriting separate memory
> areas like the return address...
>
> Everyone knows this. Let's do something to
> stop it. Something simple, without too much
> fuzz.
>
> In this case the compiler generates code that
> in case of index error
> jumps to this label and does what the programmer
> specifies.
>
> The motto of C is that: Trust the programmer.
>
> We have just to allow him/her to specify what to do
> in case of overflow.
>
> Trust the programmer doesn't mean that we trust
> that he never does a mistake of course. It means
> that the programmer can specify what actions
> to take in case of error and provide sensible
> defaults.
>
> Default is then, to finish the program like the
> assert() macro, another useful construct.
>
> Note that this proposal doesn't change anything
> in the language. No new constructs, even if
> compilers could provide arrangements like the
> one proposed above.
>
> I propose then:
>
> #pragma STDC bounds_checking(ON/OFF)
>
> that should be written outside a function scope.
>
> That's all.
>
> This proposal is an invitation to
> brain-storming..:-)
>
> I know that anyone using C is aware of this.
> So, let's fix it.

Sounds like a good idea. Since nothing gets standardised without
someone doing it first, why not implement it in LCC, then see how many
problems are encountered.

Before you do read:
http://www.doc.ic.ac.uk/~phjk/BoundsChecking.html

this is how it was done in TCC.

Perhaps try to make it work the same way as TCC to gets things started
without initial compatibility problems.



Relevant Pages

  • Bounds checked arrays
    ... As everybody knows, the C language lacks ... When the state of this toggle is ON, the compiler ... Important is to know that the array updates ... We have just to allow him/her to specify what to do ...
    (comp.lang.c)
  • Re: Bounds checked arrays
    ... > more at each array access will not make any ... But for those applications which are real-time ... > When the state of this toggle is ON, the compiler ... The C language has a long-standing tradition ...
    (comp.lang.c)
  • Re: Teaching new tricks to an old dog (C++ -->Ada)
    ... > attributes helps both the writer and the compiler. ... You can build algorithms around language provided ... > Maybe, if one gets used to template programming, the basis of the ... > Try telling someone that useful array attributes, ...
    (comp.lang.ada)
  • Re: Teaching new tricks to an old dog (C++ -->Ada)
    ... > attributes helps both the writer and the compiler. ... You can build algorithms around language provided ... > Maybe, if one gets used to template programming, the basis of the ... > Try telling someone that useful array attributes, ...
    (comp.lang.cpp)
  • Re: Java Indexing- Historical question
    ... multidimensional array with a specified starting index for each dimension to zero based requires care, but that care is only required at one place in a compiler to be applied to all arrays in all programs compiled by it. ... I would prefer to trust a piece of compiler code like that, and let programmers use the index ranges that simplify their programs. ... He thinks that being able to specify lower and upper bounds is unnecessary, ... Well, if he considered the number of programmer hours that's been wasted in tracking down bugs related to offset errors in array indexing, he might think differently. ...
    (comp.lang.java.programmer)