Re: asm and C/C++ data types
- From: "robertwessel2@xxxxxxxxx" <robertwessel2@xxxxxxxxx>
- Date: 8 May 2007 20:23:53 -0700
On May 8, 8:08 pm, "Rod Pemberton" <do_not_h...@xxxxxxxxxxx> wrote:
<neo.hori...@xxxxxxxxx> wrote in message
news:1178582703.832489.266660@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
I know ,thanks to this (http://atrevida.comprenica.com/atrtut19.html)
tutorial, that C/C++ chars correspond to asm BYTEs, ints correspond to
WORDs, etc,
No. This is environment specific. (I'm surprised Robert didn't correct you
on that.)
Quite right. Chalk it up to sleep deprivation and the OP specifically
asking about x86...
A "byte" in C is not 8-bits. The ANSI C standard redefined a byte from it's
common definition of 8-bits to make the C language more portable. A "byte"
in C is defined as the smallest addressable group of bits large enough to
represent the all the characters in the character set. For ASCII, this
means at least 7-bits. A "char" in C is a C byte. However, C also places
minimum requirements on the sizes of types. For a char, the standard
requires the minimum size to be 8-bits.
Correct.
Therefore, a char in C is the
minimum multiple of the smallest group of addressable bits which are greater
than or equal to 8-bits. If the microprocessors' minimum addressable size
is greater than or equal to 8-bits: 8 or 16 or 24-bits, etc., the a byte is
8 or 16 or 24-bits, respectively. If the microprocessor's minimum
addressable size is 3-bits, then a byte would be 3*3 or 9-bits, since it
must be larger than or equal to 8-bits. If the microprocessor's minimum
addressable size is 7-bits, then a byte would be 2*7 or 14-bits.
No, the standard make no such demand, just that chars be at least
eight bits. If that corresponds in some convenient way to what the
hardware supports, great, if not, the implementers will have to cope.
For example, C has been implemented on many word addressed machines
(IOW the minimum addressable unit is a word, typically somewhere
between 16-128 bits, with 30, 36 and 60 being somewhat common non-
power-of-two sizes), but still used 8/9/10 bit characters. In that
case any byte operations have to be synthesized out of various whole
word operations, usually involving a great deal of masking and merging
via the boolean operations.
That's also one of the reasons the internals of pointers are so
pointedly undefined, and why char and void pointers have special
properties, and potentially different sizes. On a word addressed
machine with smaller than word sized chars, extra address information
has to be kept in a char pointer to point to the byte within the
word. Machines that are word addressed have typically taken two
approaches to the addresses. First, and most convenient for a C
implementation, is that the integer value of word address corresponds
to the implied byte address (thus consecutive valid addresses might be
0, 4, 8, 12, 16, 20...). On those machines it's been fairly universal
to use the otherwise unused low bits for the byte sub-address. Other
word addressed machines number the words in unit increments (IOW, 0,
1, 2, 3, 4, 5...). On those machines I know of three different
approaches that have been taken by C implementations. First, a
separate bit of storage is attached to char and void pointers to hold
the byte sub-address. That has the advantage of not reducing the
usable address space and leaving the word address part in a natural
format, but makes a hash of every reference or modification of a char
or void pointer. Second, you have implementations of char and void
pointers that shift the word address left an appropriate amount, and
then paste in a byte offset in the now empty low bits. This creates a
natural byte address and leaves pointers a consistent size, but makes
the pointers difficult to use (since they can no longer be converted
to other pointer types or used for addressing without shifting things
back to the right), and disables a sizable fraction of the address
space (the bits implicitly shifted out the left end of the address
register). Third, implementations have put the byte offset in the top
(leftmost) bits of the char or void pointer. That leaves the
addresses easy to use, but puts the byte offset in an annoying spot,
and still wastes a big chunk of the address space.
Of course there have also been implementations on word addressed
machines where a char is quite large (corresponding to a whole word).
Also, while you'd usually expect that a C implementation on a byte
addressed platform like x86 would have 8 bit chars, that also is not
mandated in any way, and a perfectly legal implementation could
support 16 bit chars.
.
- Follow-Ups:
- Re: asm and C/C++ data types
- From: Rod Pemberton
- Re: asm and C/C++ data types
- References:
- asm and C/C++ data types
- From: neo . horizon
- Re: asm and C/C++ data types
- From: Rod Pemberton
- asm and C/C++ data types
- Prev by Date: Re: GPL licencing and Betov
- Next by Date: Re: duplicate ops (Re: updated assembler)
- Previous by thread: Re: asm and C/C++ data types
- Next by thread: Re: asm and C/C++ data types
- Index(es):
Relevant Pages
|