Re: Hidden read of indeterminate memory
From: Holger Hasselbach (hasselbach_at_galad.com)
Date: 18 Dec 2003 14:13:21 -0800
pete <email@example.com> wrote:
> Chris Torek wrote:
> > I claim that one of these must win, and the one that must win is
> > the latter. But -- as I noted earlier -- no one can use this to
> > any useful effect in any (pure) C program, as they merely get a
> > "Schrodinger's Cat" (as it were) in the object representation
> > resulting from this kind of inspection of such an uninitialized
> > variable. ("Impure" or non-portable C code might use it to see
> > whether, for instance, the operating system on which the C system
> > runs clears pages before handing them out, or leaves cleartext
> > passwords and such accessible.)
> I understand you to mean that the use
> of an indeterminate unsigned char value,
> is unspecified behavior.
> Is that right ?
As far as I understand it now, thanks to the excellent explanations
from Chris, the C system of object storage can be expressed in three
Layer 3: Interpretation - Used values
Layer 2: Representation - Pure binary, unsigned char
Layer 1: Storage - Hardware
On the bottom is the hardware with the physical memory where the data
is stored and read from. Physical means, as usual, a real machine or a
virtual simulated one. It doesn't matter. The storage has to use the
provided addressing scheme and the memory format, e.g. the number of
bits per address.
On the second layer is the C data representation, based on a pure
binary representation with any bit used, no padding and no traps. It
is the compiler writer's task to provide the mapping between this
representation and the storage. For most of todays computers it is a
simple 1:1 mapping, because they are binary with a flat memory of a
given bitwidth. But there could be a computer that uses three states
per bit: 0, 1 and 2. The 'bit' would then be a ternary digit, a, erm,
'tit'. But because the representation is already an abstraction from
the hardware, it is possible to write a conforming compiler for this
system by implementing the mapping between the representation's bits
and the storage's tits. It could be a 1:1 mapping with simply ignoring
the third state, or it could be a compressed mapping with some fancy
calculations using the powers of 2 and 3.
On the third layer is the interpretation, the working with the values.
When you make an assignment like p=NULL, you are working with the
value NULL. The values are based on the representation, and there is a
mapping between them. It is the second level of abstraction. The
abstraction goes so far that there can be different representations
for the same value. When you make a compare of these values, they are
required to compare as equal, because the comparision is defined on
the interpretation layer, not on the representation layer.
Layer 3: NULL == NULL p == a
Layer 2: 0x00000000 != 0xffffffff memcmp(&p, &a, sizeof(p)) != 0
And of course there can be the trap representations without a mapping
to the interpretation at all. Obviously, the same representation maps
to different values for different object types, the mapping is
All C object types are working in layer 3. Working with those objects
is always working with values that are mapped from the representation
before a read and mapped to the representation after the write.
Layer 3: +-- Write p = a; Read <-+
Layer 2: xxxxxxxx xxxxxxxx
All objects including unsigned char, with the only exception that the
mapping between layer 2 and 3 is a must-have 1:1 for this type,
without any traps. Thus unsigned char provides a direct access to the
representation even for indeterminate data.
Both layers 2 and 3 are pure abstractions. They can be mapped 1:1 down
to layer 1, but it is also possible to have 3 completely different
representations of data on each layer.
Layer 3: p = 0; (Value 0)
Layer 2: 0xffffffff (Binary representation of a NULL pointer)
Layer 1: All tits 2 (0 mapped to 1-tit, 1 mapped to 2-tit)
Because layer 2 is a pure mapping without any data context, and all C
functionality happen in layer 3 with the context and traps, it makes
sense to allow undefined behaviour only on layer 3, not on layer 2.
This is what Chris meant with Schrodinger: You can move the
representation data of an object with unsigned char arrays and compare
the representation as equal, but you can't compare the object values
on layer 3 (except for unsigned char itself) when the data is
indeterminate. As soon as you read it you have undefined behaviour,
including the spontaneous change of the previously read values. Thus
you know that the values should be equal, but you can't prove it. ;)
Did I get it right? Or halfway right? Or right with some