Bits!

From: Ross A. Finlayson (spamtrap_at_crayne.org)
Date: 09/28/04

  • Next message: Ross A. Finlayson: "Re: Kind of new: function implementation questions, MASM"
    Date: Tue, 28 Sep 2004 08:17:15 +0000 (UTC)
    
    

    Programming with Bits

    It's common knowledge that a computer stores data only in bits: the 1's
    and 0's that comprise the binary alphabet 0,1.

    While that is so, it's not always easy or efficient to deal with bits
    when programming a computer, because the computer's method to address
    data works with blocks of bits, 8, 16, 32, or 64 bits at a time. Thus
    it is not always simple or efficient to interpret or express long
    sequences of bits that aren't used a fixed, hardwired amount of bits at
    a time.

    For example, the data of interest might be bits 4-19, numbered from 0,
    in a sequence of four bytes, 32 bits. You can only address specifically
    any one of those bytes, and then it is necessary to pick apart the byte
    and put together bits from different bytes, and it is cumbersome to
    handle that. As well, if you want to emit, say, three bits then six
    bits then two bits into a couple bytes, again it involves setting the
    contents of one byte, then modifying it and the next byte also for the
    next bit sequence, and then modify that next byte again, and then
    keeping track of how to modify that next one.

    There are various requirements to access the bits. Let's focus on
    reading bits first. Basically then there is some input data, of some
    arbitrary length. Let's say the length of the input data, in bytes, is
    known, although because the bitstring is not necessarily a multiple of
    eight bits, the last zero to seven bits of the byte sequence are
    meaningless. For simplicity, say there is a multiple of eight bits
    stored in a byte sequence.

    0x12 0x34

    That is two bytes, and the bit representation is thus:

    0001 0010 0011 0100

    If that is data stored two or four bits at a time, it is not so bad:
    just read two bits four times for each byte for two bits. Yet, when the
    data is stored three bits at a time, then you can read the first three
    bits from the first byte, and the next three bits from the first byte,
    but then you have to read two bits from the first byte and then the
    first bit from the second byte, and combine them, and then in reading
    the second byte it is necessary to read the second, third, and fourth
    bit, and then the fifth, sixth, and seventh, and then there is an extra
    bit there that is unused.

    It gets even more complicated when the amount of bits to read from two
    bytes varies. For example, you might need to read five bits, and then
    six bits, and then five bits, which at least starts over each two bytes
    (16 bits), but it might be necessary to read three bits, and then from
    those three bits detemine how many bits the next item contains, and so
    on.

    Dealing with bitstreams rapidly leads to many variables, edge cases,
    inefficiencies, and logical complications. Thus, it's a good idea to
    determine a classification of bit-wise access strategies to help enable
    efficient bitstream use.

    In reading bits, we might consider these cases: a fixed number of bits
    for each record, and a varying or even dynamic number of bits for each
    record. Let's start with a fixed number of bits for each record.

    Bits in a byte are generally numbered from 7 to 0. This is because the
    least significant bit is called bit 0, and the most significant bit (of
    eight bits) is called bit 7, and, because bits in a byte are generally
    ordered from most-significant-bit (msb) to least-significant-bit
    (lsb). The bits toward the left side are called the high bits, towards
    the right side the low bits.

    Now in reading a fixed width of bits from an array of bytes that can
    only be addressed each eight bits, then there are two cases: if the
    width of the bits is a power of two less than or equal to the word
    width, then the records can be accessed relatively simply. For example,
    where the word width is eight for a byte, then bit records of fixed size
    2, 4, or 8 can be accessed in a simple manner, because each word can be
    considered without consideration of the previous or next word, or the
    index of the word in the list of words.

    So an access pattern might be, for each word, for a word that is a byte
    and fixed two bits:

    1. address the word
    2. get first two bits
    3. get second two bits
    4. get third two bits
    5. get fourth two bits
    6. go to step 1 while there are any more words in the input

    That is not too bad, for example in C it might be:

    processbyte:
    byte b = bytearray[i];
    processbits(b & 0xC0 >> 6);
    processbits(b & 0x30 >> 4);
    processbits(b & 0x0C >> 2);
    processbits(b & 0x03);
    i++
    if(i<bytearraylength) goto processbyte;
    ...

    Then, where there are fixed length records of three bits instead of two,
    then instead of that loop that has about five lines, it takes upwards of
    forty or fifty lines to as efficiently as possible from C process in
    order those fixed length records.

    Data is often stored in bit records that are word width:

    processbyte:
    processbits(bytearray[i]);
    i++;
    if(i<bytearraylength) goto processbyte;
    ...

    So anyways in considering bit access strategies, there are lots of cases
    to consider, and what ends up happening is that for each different
    access pattern, and indeed many times for each particular
    implementation, a unique and custom-built and often quite inefficient
    access strategy is laboriously implemented by hand.

    Partially that's because there is no standardized method or set of
    patterns to accomplish the needed work of various bitstream access
    strategies.

    There are a lot of ways that bitstreams are used to encode data. For
    example, if an image has a given number of bits per sample, then often
    the data is packed thus that the data takes less room than if it were
    simply using a few bits of each word. Then, to modify that data for
    most purposes, it needs to be either unpacked or specially modified as
    packed bytes.

    There seem to be two major paradigms of the bit handling: reading and
    writing. In reading, the input is a bitstream and the output is a list
    of symbols that represent the data encoded into those bits. In writing
    it is nearly the opposite: the output is a bitstream and the input is a
    list of symbols.

    One of the primary cursable aspects of dealing with the bitstream data
    is that the bit groups are not necessarily evenly dividing into the word
    width, thus that reading or writing those bit to a contiguous bit stream
    demands in those cases that multiple words be considered at a time.
    Instead of considering only a single register of the computer at a time,
    two or even more registers need to be handled at once. It's bad enough
    when a bit sequence may be within two registers, it's worse when the bit
    sequence width is more than that of two or more registers.

    When the bit sequence is within one register, then the operations to
    isolate it and output it basically unpacked to an array of items that
    each can contain the largest bit sequence is not so bad. For example,
    where the data is msb to lsb in bit order, and MSB to LSB in byte order,
    then on a big-endian machine (MSB to LSB) with 32-bit registers, 32
    contiguous bits of the bitstream can be loaded onto a register at a
    time. Then, the input register is copied into a working register.
    Then, the register can be shifted right, towards the lsb, thus that the
    lsb of the bit sequence is the lsb of the register. Then, where the
    width of the bit sequence is known, the high order bits of the register
    are masked off or cleared, and then the register contains the result bit
    sequence that is then a machine integer representing the needed data.
    That register is then copied into a memory location, the memory pointer
    is updated to refer to the next location, and then the next bit sequence
    is parsed/scanned from the original register.

    That depends on that: the bit sequence width is known ahead of time,
    and that the offset of the beginning of the bit sequence in the register
    and the width of the bit sequence is less than the width of the
    register. If the bit sequence straddles registers, then multiple
    registers need to be considered, and it gets messy.

    So, I've looked around, what do you think about using assembly language
    to manipulate bitstreams? What algorithms, minimal or exotic, are
    reusable for various schemes? How do you use assembler to save the the
    processor billions of cycles over any "high" level language for bit
    sequencing?

    Thanks for your opinions,

    Ross F.

    --
    "Also, consider this:  the unit impulse function times
    one less twice the unit step function times plus/minus
    one is the mother of all wavelets."
    

  • Next message: Ross A. Finlayson: "Re: Kind of new: function implementation questions, MASM"

    Relevant Pages

    • Re: Variadic functions calling variadic functions with the argument list, HLL bit shifts on LE proce
      ... >to file or stream as a sequential sequence of bytes. ... you have to deal with the bytes in the ENDIAN ORDER THEY ... >at once onto the register. ... >So anyways, a point here is that on the little-endian machine, when the ...
      (comp.lang.c)
    • Re: Bits!
      ... >> and that the offset of the beginning of the bit sequence in the register ... process each 3 bits of eax ... performance to load the register with 32 bits that is not on a 32-bit boundary, ...
      (comp.lang.asm.x86)
    • Re: 8 Bit Random Numbers
      ... the shift direction. ... The following C code generates the same bit sequence ... the working register and then shift the register again which rolls ... XORLW 1 ...
      (sci.electronics.basics)
    • Re: 8 Bit Random Numbers
      ... or "0" to clock into the shift register. ... SYMATTR InstName V1 ... '8 bit pseudo-random sequence generator with no lockup state. ...
      (sci.electronics.basics)
    • Re: spartan 3e JTAG programming
      ... Often the start-up sequence requires a few extra clocks to complete. ... Check to see if you are sending enough nulls after the end of the bitstream. ... The startup sequence requires eight more clocks to complete, ...
      (comp.arch.fpga)