Re: Binary-mode i/o, width of char, endianness
From: T Koster (reply-to-group_at_use.net)
Date: 03/01/05
- Next message: Lawrence Kirby: "Re: Storing variable length data in file -- Paging"
- Previous message: Richard Bos: "Re: padding bits..."
- In reply to: infobahn: "Re: Binary-mode i/o, width of char, endianness"
- Next in thread: infobahn: "Re: Binary-mode i/o, width of char, endianness"
- Reply: infobahn: "Re: Binary-mode i/o, width of char, endianness"
- Reply: Lew Pitcher: "Re: Binary-mode i/o, width of char, endianness"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Tue, 01 Mar 2005 13:11:38 GMT
infobahn wrote:
> T Koster wrote:
>
>>I'm having some difficulty figuring out the most portable way to read 24
>>bits from a file. This is related to a Base-64 encoding.
>>
>>The file is opened in binary mode, and I'm using fread to read three
>>bytes from it. The question is though, where should fread put this? I
>>have considered two alternatives, but neither seem like a good idea:
>>
>>In most cases, the width of a char is 8 bits, so an array of 3 chars
>>would suffice, but the width of a char is guaranteed to be only *at
>>least* 8 bits, so the actual number of chars required would be 24 /
>>CHAR_BIT, rounded up. Since you can't round in a constant integral
>>expression, 3 chars is a good safe buffer size because it's guaranteed
>>to be at least 24 bits.
>
> To store BITS bits, you need at least (BITS + CHAR_BIT - 1) / CHAR_BIT
> bytes. If BITS is constant:
>
> #define BITS 24
>
> then:
>
> unsigned char buf[(BITS + CHAR_BIT - 1) / CHAR_BIT] = {0};
>
> is legal.
Ahh, good idea.
>>However, since I need to be able to divide
>>those 24 bits into four 6-bit numbers, indices into the char array
>>become more complicated as the 6-bit numbers do not fall evenly on the
>>(presumably) 8-bit boundaries that indexes in the array would give me.
>
> So you need to mask and shift. If we assume that each octet of data
> is stored in a separate byte, then this isn't as hard as it sounds.
>
> /* 1. get bits 7 through 2 of first octet */
> num[0] = (buf[0] & 0xFC) >> 2;
> /* 2. get bits 1 and 0 of first octet, and bits 7 through 4 of
> second octet */
> num[1] = ((buf[0] & 0x03) << 6) | ((buf[1] & 0xF0) >> 4);
>
> etc.
>
>>If the width of a char is not 8 bits, then knowing which indices to look
>>at and shift/mask is even more difficult.
>
> See above if they're spread out, with 8 value bits to each byte
> (the remaining bits being unused). If they're packed in, you just
> have to be a little clever with CHAR_BIT. Once you start to analyse
> this problem, you'll see that it isn't as hard as it sounds.
We seem to be using the term 'byte' with different meanings...see below.
>>As such, I thought of the
>>second option.
>>
>>The second option is to allocate the input buffer as simply one int
>>object that is guaranteed to be at least 24 bits wide: the long int,
>>which even has 8 bytes to spare.
>
> Well, at least 8 *bits* to spare. :-)
Certainly :)
>>fread can safely write 3 bytes of data
>>into a long int.
>
> Not necessarily. On platforms such as the kind you are worrying about
> (CHAR_BIT > 8), long int may well be fewer than four bytes wide!
>
> Consider a platform with 11-bit bytes. On such a platform, long ints
> may only occupy 3 bytes. On (perhaps more common) platforms with
> 16-bit or 32-bit bytes, long int may be only 2 bytes, or even 1 byte.
Hmmm, this appears to be becoming a question of terminology. I thought
that by definition, one byte is eight bits wide. I'm not using the C
type 'char' interchangably with 'an int that is one _byte_ big'. When I
consider that CHAR_BIT may be greater than 8, I mean exactly that, and
not that a byte of storage on this platform has more than eight bits,
since I thought that was nonsense. That is, a char may occupy more than
one byte of storage, but a byte is still an 8-bit byte. Calling fread
and asking for three bytes implies that 24 bits will be read,
irrespective of platform, correct? As such, a long int, being
guaranteed to have at least 32 bits, is guaranteed to occupy at least
four bytes of storage, which is why I say that fread can safely store
three bytes (24 bits by definition) in a long int. Correct me if I'm
wrong here.
> I would stick to unsigned char for this project. Long ints will
> multiply your headaches, divide your attention, add to your
> worries, and subtract from your understanding (modulo their
> day-to-day uses, obviously).
Thanks,
Thomas.
- Next message: Lawrence Kirby: "Re: Storing variable length data in file -- Paging"
- Previous message: Richard Bos: "Re: padding bits..."
- In reply to: infobahn: "Re: Binary-mode i/o, width of char, endianness"
- Next in thread: infobahn: "Re: Binary-mode i/o, width of char, endianness"
- Reply: infobahn: "Re: Binary-mode i/o, width of char, endianness"
- Reply: Lew Pitcher: "Re: Binary-mode i/o, width of char, endianness"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|