Re: Finding the highest bit set in a 32-bit number

From: Jim Rogers (jimmaureenrogers_at_att.net)
Date: 03/09/05


Date: Wed, 09 Mar 2005 00:58:24 GMT

Joe Wright <joewwright@comcast.net> wrote in news:97ednZPzptMw2LPfRVn-
gg@comcast.com:

> jimmaureenrogers@worldnet.att.net wrote:
>> Willem wrote:
>>
>>>jimmaureenrogers@worldnet.att.net wrote:
>>>) Here is an Ada solution to this problem:
>>>) ...
>>>)
>>>) This solution uses the trick of overlaying the 32 bit integer
>>>) with a 32 bit packed array of boolean. I then simply iterate
>>>) backwards through the array and report the bit number of the
>>>) first set bit.
>>>
>>>You must have completely overlooked that it's supposed to run fast.
>>
>>
>> I didn't overlook it. How fast does it have to be to qualify
>> as fast? How much slower is my solution than, say, a series of
>> bit shifts encapsulated in a "goto" loop?
>>
>> On my AMD 64 computer running WinXP my solution takes 0.00004
>> seconds to run. That seems pretty fast to me.
>>
> That's 40 microseconds. What's the clock speed of that AMD of yours? If
> it's 2GHZ or so, that's 80,000 clock cycles or so. Fast?
>

Ok. I agree that might not be too fast.
I did another run with the integer having a value of 0.
This is worst case for my algorithm, except that the
program does not slow down to print to the screen.
The value of 0 reveals the worst case timing of the
algorithm itself.

Tests with any other values all resulted in the same
timing, which can only be explained by the I/O bottleneck.

My worst case timing without waiting for I/O is
0.000002235 seconds, which, according to your
calculation is still about 4,000 clock cycles.
Given that my algorithm should have a linear
timing, one could assume average performance at
about half that time. This would also lead me to
estimate about 125 cycles per bit comparison.

I remember reading a timing from one of the
C solutions that claimed 50 - 60 microseconds
for a P4 at 1.4GHz. My 40 microseconds at 2.2GHZ
translates to 62 microseconds at 1.4 GHz. This is
the timing including I/O. Without the I/O
my algorithm should produce a worst case solution
of 3.5 microseconds on a 1.4 GHz P4.

If my calculations ar correct, then my solution does
not look too bad to me.

Jim Rogers



Relevant Pages

  • Re: simple parallel matrix multiplication example
    ... STARTP mul-row1 ENDP ... Timing the serial algorithm: 0.009 seconds elapsed. ... Timing the parallel algorithm: 0.004 seconds elapsed. ...
    (comp.lang.forth)
  • Re: Zip Chip programming info / disk image?
    ... which is doing I/O operations involving a "slow" slot. ... If a peripheral has timing critical access code and it isn't accessing ... Apple 3.5 drives are able to be accessed with the IWM ... The Zip chip reacts to the DEVSEL area. ...
    (comp.sys.apple2)
  • OFDM timing synchronization
    ... I've been struggling with OFDM timing synchronization using Schmidl algorithm for some time but I can't figure out how to find the sample at which the symbol starts. ... I can't simply try all possible shifts to find the correct offset because subcarriers are altered in the channel and I need to know this correct offset within the symbol to perform channel estimation by./ symb_expected) ...
    (comp.dsp)
  • Re: Debugging SDRAM interfaces
    ... do these timing constraints look sane? ... The key to getting good I/O timing is to ensure the tools place the I/O ... For each I/O pin you will see a lot of information including the I/O ...
    (comp.arch.fpga)
  • Re: OFDM timing synchronization
    ... I've been struggling with OFDM timing synchronization using Schmidl ... subcarriers are altered in the channel and I need to know this correct ... I also tried to investigate Park algorithm that gives very sharp metric. ...
    (comp.dsp)