Re: asm and C/C++ data types
- From: "Rod Pemberton" <do_not_have@xxxxxxxxxxx>
- Date: Wed, 9 May 2007 03:18:37 -0400
<robertwessel2@xxxxxxxxx> wrote in message
news:1178681033.936358.193610@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
On May 8, 8:08 pm, "Rod Pemberton" <do_not_h...@xxxxxxxxxxx> wrote:you
<neo.hori...@xxxxxxxxx> wrote in message
news:1178582703.832489.266660@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
I know ,thanks to this (http://atrevida.comprenica.com/atrtut19.html)
tutorial, that C/C++ chars correspond to asm BYTEs, ints correspond to
WORDs, etc,
No. This is environment specific. (I'm surprised Robert didn't correct
it'son that.)
Quite right. Chalk it up to sleep deprivation and the OP specifically
asking about x86...
A "byte" in C is not 8-bits. The ANSI C standard redefined a byte from
"byte"common definition of 8-bits to make the C language more portable. A
toin C is defined as the smallest addressable group of bits large enough
placesrepresent the all the characters in the character set. For ASCII, this
means at least 7-bits. A "char" in C is a C byte. However, C also
greaterminimum requirements on the sizes of types. For a char, the standard
requires the minimum size to be 8-bits.
Correct.
Therefore, a char in C is the
minimum multiple of the smallest group of addressable bits which are
sizethan or equal to 8-bits. If the microprocessors' minimum addressable
isis greater than or equal to 8-bits: 8 or 16 or 24-bits, etc., the a byte
8 or 16 or 24-bits, respectively. If the microprocessor's minimum
addressable size is 3-bits, then a byte would be 3*3 or 9-bits, since it
must be larger than or equal to 8-bits. If the microprocessor's minimum
addressable size is 7-bits, then a byte would be 2*7 or 14-bits.
No, the standard make no such demand, just that chars be at least
eight bits. If that corresponds in some convenient way to what the
hardware supports, great, if not, the implementers will have to cope.
No, the standard make no such demand,...
I'm not sure what you are disagreeing with... The only thing I see which I
should've stated better was "a char in C is the minimum" changed to "a char
in C is at least the minimum".
You stated the one paragraph above was correct, but then declared the
restatement of the exact same conditions in the second paragraph as
incorrect.
1) a C byte must be large enough to represent the entire character set - by
definition n1124 3.6
2) a C byte is an addressable group of bits sufficiently large to meet the
requirements in 1) - by definition n1134 3.6
3) a char is a C byte - by definition n1124 3.7.1
4) a char must be 8-bits or more - limits.h CHAR_BIT n1124 5.2.4.2.1
You go on to give examples (all of which I'll assume were intended as
contradictory):
A) the first violates the addressable requirement, but the implementors
coped...
B) the second complies by extending addressability with extra hidden
addressing information
C) the third complies
D) the fourth complies
So, I only see one which doesn't comply with what I said, but it also
doesn't comply with the spec.'s requirements either... Odd, I seem to
recall the addressable requirement as being more explicit than what is
stated in n1124.
Example A:
For example, C has been implemented on many word addressed machines
(IOW the minimum addressable unit is a word, typically somewhere
between 16-128 bits, with 30, 36 and 60 being somewhat common non-
power-of-two sizes), but still used 8/9/10 bit characters. In that
case any byte operations have to be synthesized out of various whole
word operations, usually involving a great deal of masking and merging
via the boolean operations.
Example B:
That's also one of the reasons the internals of pointers are so
pointedly undefined, and why char and void pointers have special
properties, and potentially different sizes. On a word addressed
machine with smaller than word sized chars, extra address information
has to be kept in a char pointer to point to the byte within the
word. Machines that are word addressed have typically taken two
approaches to the addresses. First, and most convenient for a C
implementation, is that the integer value of word address corresponds
to the implied byte address (thus consecutive valid addresses might be
0, 4, 8, 12, 16, 20...). On those machines it's been fairly universal
to use the otherwise unused low bits for the byte sub-address. Other
word addressed machines number the words in unit increments (IOW, 0,
1, 2, 3, 4, 5...). On those machines I know of three different
approaches that have been taken by C implementations. First, a
separate bit of storage is attached to char and void pointers to hold
the byte sub-address. That has the advantage of not reducing the
usable address space and leaving the word address part in a natural
format, but makes a hash of every reference or modification of a char
or void pointer. Second, you have implementations of char and void
pointers that shift the word address left an appropriate amount, and
then paste in a byte offset in the now empty low bits. This creates a
natural byte address and leaves pointers a consistent size, but makes
the pointers difficult to use (since they can no longer be converted
to other pointer types or used for addressing without shifting things
back to the right), and disables a sizable fraction of the address
space (the bits implicitly shifted out the left end of the address
register). Third, implementations have put the byte offset in the top
(leftmost) bits of the char or void pointer. That leaves the
addresses easy to use, but puts the byte offset in an annoying spot,
and still wastes a big chunk of the address space.
Example C:
Of course there have also been implementations on word addressed
machines where a char is quite large (corresponding to a whole word).
Example D:
Also, while you'd usually expect that a C implementation on a byte
addressed platform like x86 would have 8 bit chars, that also is not
mandated in any way, and a perfectly legal implementation could
support 16 bit chars.
As usual, interesting...
Rod Pemberton
PS. Thanks for the 0xff 0xff response. I think I misunderstood some of
what they were saying due to vagueness. I think they were saying the EIP
wrap around doesn't fault on modern Intel CPU's. This was due to the
ancient 0xff 0xff functionality. I took it to mean that 0xff 0xff was
currently implemented.
.
- Follow-Ups:
- Re: asm and C/C++ data types
- From: robertwessel2@xxxxxxxxx
- Re: asm and C/C++ data types
- References:
- asm and C/C++ data types
- From: neo . horizon
- Re: asm and C/C++ data types
- From: Rod Pemberton
- Re: asm and C/C++ data types
- From: robertwessel2@xxxxxxxxx
- asm and C/C++ data types
- Prev by Date: Re: Ratch in Denial, Bush's buddy going out backwards.
- Next by Date: Re: GPL licencing
- Previous by thread: Re: asm and C/C++ data types
- Next by thread: Re: asm and C/C++ data types
- Index(es):
Relevant Pages
|