Re: asm and C/C++ data types




<robertwessel2@xxxxxxxxx> wrote in message
news:1178681033.936358.193610@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
On May 8, 8:08 pm, "Rod Pemberton" <do_not_h...@xxxxxxxxxxx> wrote:
<neo.hori...@xxxxxxxxx> wrote in message

news:1178582703.832489.266660@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

I know ,thanks to this (http://atrevida.comprenica.com/atrtut19.html)
tutorial, that C/C++ chars correspond to asm BYTEs, ints correspond to
WORDs, etc,

No. This is environment specific. (I'm surprised Robert didn't correct
you
on that.)


Quite right. Chalk it up to sleep deprivation and the OP specifically
asking about x86...


A "byte" in C is not 8-bits. The ANSI C standard redefined a byte from
it's
common definition of 8-bits to make the C language more portable. A
"byte"
in C is defined as the smallest addressable group of bits large enough
to
represent the all the characters in the character set. For ASCII, this
means at least 7-bits. A "char" in C is a C byte. However, C also
places
minimum requirements on the sizes of types. For a char, the standard
requires the minimum size to be 8-bits.


Correct.


Therefore, a char in C is the
minimum multiple of the smallest group of addressable bits which are
greater
than or equal to 8-bits. If the microprocessors' minimum addressable
size
is greater than or equal to 8-bits: 8 or 16 or 24-bits, etc., the a byte
is
8 or 16 or 24-bits, respectively. If the microprocessor's minimum
addressable size is 3-bits, then a byte would be 3*3 or 9-bits, since it
must be larger than or equal to 8-bits. If the microprocessor's minimum
addressable size is 7-bits, then a byte would be 2*7 or 14-bits.


No, the standard make no such demand, just that chars be at least
eight bits. If that corresponds in some convenient way to what the
hardware supports, great, if not, the implementers will have to cope.

No, the standard make no such demand,...

I'm not sure what you are disagreeing with... The only thing I see which I
should've stated better was "a char in C is the minimum" changed to "a char
in C is at least the minimum".

You stated the one paragraph above was correct, but then declared the
restatement of the exact same conditions in the second paragraph as
incorrect.

1) a C byte must be large enough to represent the entire character set - by
definition n1124 3.6
2) a C byte is an addressable group of bits sufficiently large to meet the
requirements in 1) - by definition n1134 3.6
3) a char is a C byte - by definition n1124 3.7.1
4) a char must be 8-bits or more - limits.h CHAR_BIT n1124 5.2.4.2.1

You go on to give examples (all of which I'll assume were intended as
contradictory):

A) the first violates the addressable requirement, but the implementors
coped...
B) the second complies by extending addressability with extra hidden
addressing information
C) the third complies
D) the fourth complies

So, I only see one which doesn't comply with what I said, but it also
doesn't comply with the spec.'s requirements either... Odd, I seem to
recall the addressable requirement as being more explicit than what is
stated in n1124.

Example A:
For example, C has been implemented on many word addressed machines
(IOW the minimum addressable unit is a word, typically somewhere
between 16-128 bits, with 30, 36 and 60 being somewhat common non-
power-of-two sizes), but still used 8/9/10 bit characters. In that
case any byte operations have to be synthesized out of various whole
word operations, usually involving a great deal of masking and merging
via the boolean operations.


Example B:
That's also one of the reasons the internals of pointers are so
pointedly undefined, and why char and void pointers have special
properties, and potentially different sizes. On a word addressed
machine with smaller than word sized chars, extra address information
has to be kept in a char pointer to point to the byte within the
word. Machines that are word addressed have typically taken two
approaches to the addresses. First, and most convenient for a C
implementation, is that the integer value of word address corresponds
to the implied byte address (thus consecutive valid addresses might be
0, 4, 8, 12, 16, 20...). On those machines it's been fairly universal
to use the otherwise unused low bits for the byte sub-address. Other
word addressed machines number the words in unit increments (IOW, 0,
1, 2, 3, 4, 5...). On those machines I know of three different
approaches that have been taken by C implementations. First, a
separate bit of storage is attached to char and void pointers to hold
the byte sub-address. That has the advantage of not reducing the
usable address space and leaving the word address part in a natural
format, but makes a hash of every reference or modification of a char
or void pointer. Second, you have implementations of char and void
pointers that shift the word address left an appropriate amount, and
then paste in a byte offset in the now empty low bits. This creates a
natural byte address and leaves pointers a consistent size, but makes
the pointers difficult to use (since they can no longer be converted
to other pointer types or used for addressing without shifting things
back to the right), and disables a sizable fraction of the address
space (the bits implicitly shifted out the left end of the address
register). Third, implementations have put the byte offset in the top
(leftmost) bits of the char or void pointer. That leaves the
addresses easy to use, but puts the byte offset in an annoying spot,
and still wastes a big chunk of the address space.


Example C:
Of course there have also been implementations on word addressed
machines where a char is quite large (corresponding to a whole word).


Example D:
Also, while you'd usually expect that a C implementation on a byte
addressed platform like x86 would have 8 bit chars, that also is not
mandated in any way, and a perfectly legal implementation could
support 16 bit chars.


As usual, interesting...


Rod Pemberton

PS. Thanks for the 0xff 0xff response. I think I misunderstood some of
what they were saying due to vagueness. I think they were saying the EIP
wrap around doesn't fault on modern Intel CPU's. This was due to the
ancient 0xff 0xff functionality. I took it to mean that 0xff 0xff was
currently implemented.



.



Relevant Pages

  • Re: asm and C/C++ data types
    ... should've stated better was "a char in C is the minimum" changed to "a char ... approaches that have been taken by C implementations. ... separate bit of storage is attached to char and void pointers to hold ... to other pointer types or used for addressing without shifting things ...
    (alt.lang.asm)
  • Re: asm and C/C++ data types
    ... A "char" in C is a C byte. ... That's also one of the reasons the internals of pointers are so ... approaches that have been taken by C implementations. ... to other pointer types or used for addressing without shifting things ...
    (alt.lang.asm)
  • Re: [PATCH] Create PNP device attributes via dev_attrs field of struct device
    ... The patch below is my first attempt at addressing this by creating ... *protocol, int id, char *pnpid ... int ret; ... struct device_attribute *attr, char *buf) ...
    (Linux-Kernel)
  • Re: Are _T() and TEXT() macros equivalent?
    ... critically depend on char being one addressible unit, ... indeed, char is the fundamental unit of addressing in C and C++, ... For example, the registry API ... etc. (It was a TI TMS30Cxx chip, ...
    (microsoft.public.vc.mfc)
  • Re: sizeof(ptr) = ?
    ... A void * has the same representation as a char *. ... Char pointers which require additional data ... iff the default offset is 0. ...
    (comp.lang.c)