Re: asm and C/C++ data types



On May 9, 2:18 am, "Rod Pemberton" <do_not_h...@xxxxxxxxxxx> wrote:
<robertwess...@xxxxxxxxx> wrote in message
On May 8, 8:08 pm, "Rod Pemberton" <do_not_h...@xxxxxxxxxxx> wrote:
Therefore, a char in C is the
minimum multiple of the smallest group of addressable bits which are
greater
than or equal to 8-bits. If the microprocessors' minimum addressable
size
is greater than or equal to 8-bits: 8 or 16 or 24-bits, etc., the a byte
is
8 or 16 or 24-bits, respectively. If the microprocessor's minimum
addressable size is 3-bits, then a byte would be 3*3 or 9-bits, since it
must be larger than or equal to 8-bits. If the microprocessor's minimum
addressable size is 7-bits, then a byte would be 2*7 or 14-bits.

No, the standard make no such demand, just that chars be at least
eight bits. If that corresponds in some convenient way to what the
hardware supports, great, if not, the implementers will have to cope.
No, the standard make no such demand,...

I'm not sure what you are disagreeing with... The only thing I see which I
should've stated better was "a char in C is the minimum" changed to "a char
in C is at least the minimum".

You stated the one paragraph above was correct, but then declared the
restatement of the exact same conditions in the second paragraph as
incorrect.

1) a C byte must be large enough to represent the entire character set - by
definition n1124 3.6
2) a C byte is an addressable group of bits sufficiently large to meet the
requirements in 1) - by definition n1134 3.6
3) a char is a C byte - by definition n1124 3.7.1
4) a char must be 8-bits or more - limits.h CHAR_BIT n1124 5.2.4.2.1

You go on to give examples (all of which I'll assume were intended as
contradictory):

A) the first violates the addressable requirement, but the implementors
coped...
B) the second complies by extending addressability with extra hidden
addressing information
C) the third complies
D) the fourth complies

So, I only see one which doesn't comply with what I said, but it also
doesn't comply with the spec.'s requirements either... Odd, I seem to
recall the addressable requirement as being more explicit than what is
stated in n1124.

Example A:

For example, C has been implemented on many word addressed machines
(IOW the minimum addressable unit is a word, typically somewhere
between 16-128 bits, with 30, 36 and 60 being somewhat common non-
power-of-two sizes), but still used 8/9/10 bit characters. In that
case any byte operations have to be synthesized out of various whole
word operations, usually involving a great deal of masking and merging
via the boolean operations.

Example B:





That's also one of the reasons the internals of pointers are so
pointedly undefined, and why char and void pointers have special
properties, and potentially different sizes. On a word addressed
machine with smaller than word sized chars, extra address information
has to be kept in a char pointer to point to the byte within the
word. Machines that are word addressed have typically taken two
approaches to the addresses. First, and most convenient for a C
implementation, is that the integer value of word address corresponds
to the implied byte address (thus consecutive valid addresses might be
0, 4, 8, 12, 16, 20...). On those machines it's been fairly universal
to use the otherwise unused low bits for the byte sub-address. Other
word addressed machines number the words in unit increments (IOW, 0,
1, 2, 3, 4, 5...). On those machines I know of three different
approaches that have been taken by C implementations. First, a
separate bit of storage is attached to char and void pointers to hold
the byte sub-address. That has the advantage of not reducing the
usable address space and leaving the word address part in a natural
format, but makes a hash of every reference or modification of a char
or void pointer. Second, you have implementations of char and void
pointers that shift the word address left an appropriate amount, and
then paste in a byte offset in the now empty low bits. This creates a
natural byte address and leaves pointers a consistent size, but makes
the pointers difficult to use (since they can no longer be converted
to other pointer types or used for addressing without shifting things
back to the right), and disables a sizable fraction of the address
space (the bits implicitly shifted out the left end of the address
register). Third, implementations have put the byte offset in the top
(leftmost) bits of the char or void pointer. That leaves the
addresses easy to use, but puts the byte offset in an annoying spot,
and still wastes a big chunk of the address space.

Example C:

Of course there have also been implementations on word addressed
machines where a char is quite large (corresponding to a whole word).

Example D:



What I was objecting to was the notion that the minimum addressable
unit of the microprocessor had anything to do with the C
implementation. The char is the minimum addressable unit for the C
virtual machine. However that might map onto hardware facilities in a
given ISA is up to the implementation, and ISAs which do not provide
native access to what the implementation desires to use as a char,
will have to jump through some hoops. The four examples of word
addressed machine implementations of C that I described are all
perfectly valid (and existing) implementations of C on platforms that
do not have any particular ability to address a char as defined by the
C implementation. The C standard is specifically written to allow all
the contortions I described (and then some). That's why, for example,
on some implementations char and void pointers are bigger than other
types of pointers.

Your original post said "If the microprocessor's minimum addressable
size is 3-bits, then a byte would be 3*3 or 9-bits, since it must be
larger than or equal to 8-bits." This is incorrect. A perfectly
valid C implementation could implement 8/16/32-bit chars/shorts/longs,
and generate the (**bleech**) code needed to span those C types across
the ISAs 3-bit words, quite possibly adding various amounts of padding
between items for sanity. In fact, if you have such a 3-bit-word
machine, you might do exactly that, since vast swaths of the C code in
the world is going to break if you don't have 8-bit chars, and that's
a valuable resource you'd likely want to access.

Now most C implementations do choose natively supported data types, or
straight-forward extensions thereof (for example 32 bit longs on 16
bit machines that don't naturally support a 32 bit type), for all the
obvious reasons, but there is no particular requirement that the do
so. In fact, on many machines (including many of the word addressed
machines), there's darn good practical reason (namely compatibility
with the rest of the world) to hack out support for a more
conventional 8-bit char, no matter the pain level.

.



Relevant Pages

  • Re: asm and C/C++ data types
    ... A "char" in C is a C byte. ... approaches that have been taken by C implementations. ... separate bit of storage is attached to char and void pointers to hold ... to other pointer types or used for addressing without shifting things ...
    (alt.lang.asm)
  • Re: asm and C/C++ data types
    ... A "char" in C is a C byte. ... That's also one of the reasons the internals of pointers are so ... approaches that have been taken by C implementations. ... to other pointer types or used for addressing without shifting things ...
    (alt.lang.asm)
  • Re: [PATCH] Create PNP device attributes via dev_attrs field of struct device
    ... The patch below is my first attempt at addressing this by creating ... *protocol, int id, char *pnpid ... int ret; ... struct device_attribute *attr, char *buf) ...
    (Linux-Kernel)
  • Re: Are _T() and TEXT() macros equivalent?
    ... critically depend on char being one addressible unit, ... indeed, char is the fundamental unit of addressing in C and C++, ... For example, the registry API ... etc. (It was a TI TMS30Cxx chip, ...
    (microsoft.public.vc.mfc)
  • Re: sizeof(ptr) = ?
    ... the smallest directly addressable memory unit is much larger than the ... On such implementations, addressibility of individual bytes is not ... using a pointer that contains both a memory address and a byte ... There are also machines with byte addressing but not byte access, ...
    (comp.lang.c)