Re: Fast addition for n+1 or n+0

From: Walter Roberson (roberson_at_ibd.nrc-cnrc.gc.ca)
Date: 02/18/05


Date: 18 Feb 2005 18:06:09 GMT

In article <cv563d$d73$1@canopus.cc.umanitoba.ca>,
Walter Roberson <roberson@ibd.nrc-cnrc.gc.ca> wrote:
|In article <37ll5mF5es672U1@individual.net>,
|Alex Vinokur <alexvn@big-foot.com> wrote:

|:Consider the following statement:
|:n+i, where i = 1 or 0.
|:Is there more fast method for computing n+i than direct computing that sum?

|It depends on the costs you assign to the various operations -- a
|matter which is architecture dependant.

There is a possibility that would be slower in any real
architecture that I've ever heard of, but which could be faster
under very narrow circumstances.

 (n&1) ? (n+i) : (n|i)

The narrow circumstances under which this could be faster are:
- this is within a tight loop that fits within the processor's
  primary instruction cache
- the processor has a "move conditional" operation that
  avoids taking an actual branch when the operations are
  simple enough and the result is being used arithmetically
  instead of to control a branch
- at the microcode level, the processor "runs free"
  when working from instruction cache, processing each
  instruction as fast as possible instead of working
  on a bus-cycle system (which is needed in most cases
  when anything outside the primary cache is being referenced)
- the cost of the bitwise AND operation plus the cost of the
  comparison to 0 plus the cost of the bitwise OR operation,
  are faster than the cost of a full addition

I have heard of one architecture (I don't recall which)
that had a "move conditional" operation that took
a test condition and two arithmetic operations as operands,
and would start doing the two artihmetic operations in
parallel at the same time it was doing the test; when the
result of the test was available, it would abort the false
branch if it was not already finished, with the result
being whichever of the arithmetic expressions was selected
by the condition.

Please note how narrow these conditions are: you would
have to know a LOT about your processor to make this kind
of optimization: the expression I give above will be slower
than a straight addition on nearly every architecture.

Addition is usually hard-coded through a series of
transistors, with the carry circuit taking most of the
landscape. It's hard to beat transistor-level speeds
by using multiple instructions.

I have heard that some architectures internally
optimize +0 and +1; there would be no way to beat that...
but again you would need to know intimate details of
the architecture.

-- 
   I don't know if there's destiny,
   but there's a decision!                  -- Wim Wenders (WoD)


Relevant Pages

  • Re: Fast addition for n+1 or n+0
    ... |:Is there more fast method for computing n+i than direct computing that sum? ... architecture that I've ever heard of, ... when working from instruction cache, ... the cost of the bitwise AND operation plus the cost of the ...
    (comp.lang.cpp)
  • Re: New fully open ISA challenges
    ... implementation cost, performance, power consumption, and/or ... While opening the Architecture to allow others to ... no cost or at modest cost tends to increase the adoption of the ... ISAs are a different matter. ...
    (comp.arch)
  • Re: MasPar compiler and simulator
    ... blue gene came from the research division. ... What might be the best architecture is, at this moment, ... I would guess that, if there is any approach to high performance computing mentioned, someone at IBM has probably taken a good look at it. ... After all the existing GPU designs are not optimized for use in large scale computing. ...
    (comp.arch)
  • Re: Phase Unwrap in FPGA?
    ... > I have a dig receiver architecture and I need to generate frequency ... The phase (atan2) is computed using a CORDIC (magnitude ... Often the freqency is computed directly by computing the arctan of the ... Unwrapping is not required with this ...
    (comp.dsp)
  • Re: New fully open ISA challenges
    ... While opening the Architecture to allow others to ... no cost or at modest cost tends to increase the adoption of the ... small code size and very small silicon area for a 16 bit design. ... I designed the ISA for minimal area silicon and code size, ...
    (comp.arch)

Loading