Re: Size of char on a 64 bit machine

From: Dan Pop (Dan.Pop_at_cern.ch)
Date: 04/29/04


Date: 29 Apr 2004 11:50:57 GMT

In <a2efcaed.0404280753.3b95e6e0@posting.google.com> arunal2001@yahoo.co.in (aruna) writes:

>How is a character stored in a word aligned machine? Assuming on 64bit
>machine, 1 byte is reserved for a char, is it the case that only 1
>byte is used to store the character and the rest 7 bytes are wasted,
>or my assumption is wrong?

It is wrong, due to the special properties of the type unsigned char: it
can be used to examine the representation of any other type. Therefore,
this type cannot, by definition, have "wasted" bits (they are called
padding bits in the C99 standard).

So, possible sizes of char on a 64-bit machine are: 8, 16, 32 and 64-bit.
If the size is less than 64-bit, sizeof word > 1 and multiple chars
can be stored in a word (the word can be aliased with an array of char).

There is only one known architecture with 64-bit word addressing (no
octet-based addressing) where C was implemented: the Cray vector
processor used in the old Cray supercomputers. char is an 8-bit type
on that particular platform.

>If my assumption is right, what are the
>performance issues in retrieving value of a character variable over
>other data types like integer, double or float.

Because the machine uses word addressing, char pointers need to store more
data than all other pointers (the address or position of the byte inside
the word). There are two ways of storing this additional information:
in the low bits, which optimises char pointer arithmetic, but requires
additional operations when the pointer is dereferenced, or in the upper
bits, which simplifies pointer dereferencing (the higher bits are
ignored, as the address space is only 48-bit) but complicates char
pointer arithmetic. I believe both ways have been uses in different
implementations. Either way, after retrieving the word containing the
char, the char itself has to be extracted from the word, and this takes
some additional shifting and masking, so char access is slower. Not
much of a problem in practice, as these machines were not intended for
intensive character manipulations, but as number crunchers.

The other, more common, 64-bit architectures use octet-based addressing
and things are no different from the more common 32-bit architectures.

Dan

-- 
Dan Pop
DESY Zeuthen, RZ group
Email: Dan.Pop@ifh.de


Relevant Pages

  • Re: Problem with va_ macros and arrays of arrays
    ... > the arrays passed to a ... > specific char, somewhat similar to what the standard function ... that with an array of struct, or possibly a pointer to a dynamic array ... > As I'm still a beginner in C without a copy of the standard I ...
    (comp.lang.c)
  • Re: Insufficient guarantees for null pointers?
    ... will the compiler know what the bounds are after converting that char * ... to an int *, if it could point to either of two arrays which happen to ... compares equal to the original pointer. ...
    (comp.std.c)
  • Re: How to mid string on a binary string?
    ... > return a pointer from that postion to the end of the binary stirng. ... I don't know how I can do a char ... just what you mean by "binary string". ... including the first null character". ...
    (comp.lang.c)
  • Re: memory leak?
    ... here is to know that GetData gives a pointer the actual data. ... instead of 'char' (I don't know if the C++ ... One of the problems with using 'char' is that it is confusing if the application is ... to represent character data, and that is confusing, because mostly characters are now ...
    (microsoft.public.vc.mfc)
  • Re: when is typecasting (unsigned char*) to (char*) dangerous?
    ... > When are they not consistent? ... patterns that are valid as `unsigned char' might be invalid ... treated as "trap representations" and could cause your program ... Given a pointer to any data object, ...
    (comp.lang.c)