Re: Hidden read of indeterminate memory

From: Holger Hasselbach (hasselbach_at_galad.com)
Date: 12/18/03


Date: 18 Dec 2003 14:13:21 -0800

pete <pfiland@mindspring.com> wrote:
> Chris Torek wrote:
> >
> > I claim that one of these must win, and the one that must win is
> > the latter. But -- as I noted earlier -- no one can use this to
> > any useful effect in any (pure) C program, as they merely get a
> > "Schrodinger's Cat" (as it were) in the object representation
> > resulting from this kind of inspection of such an uninitialized
> > variable. ("Impure" or non-portable C code might use it to see
> > whether, for instance, the operating system on which the C system
> > runs clears pages before handing them out, or leaves cleartext
> > passwords and such accessible.)
>
> I understand you to mean that the use
> of an indeterminate unsigned char value,
> is unspecified behavior.
> Is that right ?

As far as I understand it now, thanks to the excellent explanations
from Chris, the C system of object storage can be expressed in three
layers.

  Layer 3: Interpretation - Used values
  Layer 2: Representation - Pure binary, unsigned char
  Layer 1: Storage - Hardware

On the bottom is the hardware with the physical memory where the data
is stored and read from. Physical means, as usual, a real machine or a
virtual simulated one. It doesn't matter. The storage has to use the
provided addressing scheme and the memory format, e.g. the number of
bits per address.

On the second layer is the C data representation, based on a pure
binary representation with any bit used, no padding and no traps. It
is the compiler writer's task to provide the mapping between this
representation and the storage. For most of todays computers it is a
simple 1:1 mapping, because they are binary with a flat memory of a
given bitwidth. But there could be a computer that uses three states
per bit: 0, 1 and 2. The 'bit' would then be a ternary digit, a, erm,
'tit'. But because the representation is already an abstraction from
the hardware, it is possible to write a conforming compiler for this
system by implementing the mapping between the representation's bits
and the storage's tits. It could be a 1:1 mapping with simply ignoring
the third state, or it could be a compressed mapping with some fancy
calculations using the powers of 2 and 3.

On the third layer is the interpretation, the working with the values.
When you make an assignment like p=NULL, you are working with the
value NULL. The values are based on the representation, and there is a
mapping between them. It is the second level of abstraction. The
abstraction goes so far that there can be different representations
for the same value. When you make a compare of these values, they are
required to compare as equal, because the comparision is defined on
the interpretation layer, not on the representation layer.

  Layer 3: NULL == NULL p == a
  Layer 2: 0x00000000 != 0xffffffff memcmp(&p, &a, sizeof(p)) != 0

And of course there can be the trap representations without a mapping
to the interpretation at all. Obviously, the same representation maps
to different values for different object types, the mapping is
type-sensitive.

All C object types are working in layer 3. Working with those objects
is always working with values that are mapped from the representation
before a read and mapped to the representation after the write.

  Layer 3: +-- Write p = a; Read <-+
              V |
  Layer 2: xxxxxxxx xxxxxxxx

All objects including unsigned char, with the only exception that the
mapping between layer 2 and 3 is a must-have 1:1 for this type,
without any traps. Thus unsigned char provides a direct access to the
representation even for indeterminate data.

Both layers 2 and 3 are pure abstractions. They can be mapped 1:1 down
to layer 1, but it is also possible to have 3 completely different
representations of data on each layer.

  Layer 3: p = 0; (Value 0)
  Layer 2: 0xffffffff (Binary representation of a NULL pointer)
  Layer 1: All tits 2 (0 mapped to 1-tit, 1 mapped to 2-tit)

Because layer 2 is a pure mapping without any data context, and all C
functionality happen in layer 3 with the context and traps, it makes
sense to allow undefined behaviour only on layer 3, not on layer 2.
This is what Chris meant with Schrodinger: You can move the
representation data of an object with unsigned char arrays and compare
the representation as equal, but you can't compare the object values
on layer 3 (except for unsigned char itself) when the data is
indeterminate. As soon as you read it you have undefined behaviour,
including the spontaneous change of the previously read values. Thus
you know that the values should be equal, but you can't prove it. ;)

Did I get it right? Or halfway right? Or right with some
over-interpretation?

Holger



Relevant Pages

  • Re: Lucid statement of the MV vs RM position?
    ... Maybe I was wrong to say "The tabular form was inspired by data ... me that you were thinking in terms of a representation ... The presentation layer ... The storage layer ...
    (comp.databases.theory)
  • Re: Does Codds view of a relational database differ from that ofDate&Darwin?[M.Gittens]
    ... But in the RM there is no such a separation we have to keep absolutely different things in one box. ... primary keys and other possible types of columns are still normal columns at one and the same and the only layer of the model. ... If you model data semantics then forget about object representation and vice versa, if you model representation and access mechansim then forget about semantics. ...
    (comp.databases.theory)
  • Re: maps and unions
    ... My guess is that unions can be used to map one representation onto another, such as mapping a structure onto an array of unsigned char. ...
    (comp.lang.c)
  • Re: disk format pattern
    ... image representation ... Added are block header ... image or any other layer that parts ... maybe 95% of found strings with any ...
    (comp.sys.cbm)
  • Re: My Enterprise Architecture - your thoughts?
    ... >> apply to the Representation and GUI Layers. ... > I wonder if the representation layer you are referring to is actually ... >> feature at a time, adjusting all the layers that support the feature? ... > out the applications, subsystems and packages for the entire system ...
    (comp.object)