Re: sw multiply

Bill Davy wrote:
"Charles Marslett" <c-deletethis-marslett@xxxxxxxxx> wrote in message
"ivan" <wolfsafety@xxxxxxxxxxxxx> wrote:

I am very tight on time here. I have lots of long moltiplications
to do in e very short time. I needed to roughly estimate how
wastefull would be a moltiplication without a HW multiplier. But
found out that the safest option (with HW multiplier) is also the
cheapest (at least using PICs).

Depending on the architecture, and assuming you have space for a
table of arguments, the following pseudocode can make a really
fast software multiplier:

mul(a, b)
return (sqr(a+b)-(sqr[a] + sqr[b])) >> 1

It costs a table of squares from the minimum argument value to the
sum of the two largest argument values, though, and that can eat a
lot of memory. Otherwise, it involves 3 table lookups, 2 adds, 2
subtracts and a right shift.

And if (a+b) goes over the (let's say byte) boundary and so makes
indexing hard, then (d+256)^2 is "easily" computed with some
simples sums too: sqr(d) + 2.256.d + 1.256.256 where d = (a+b) %
256 (and this only happens about a quarter of the time)

Or, once you have a byte multiplier, you can use the relationship:

A = (a + b)**2 = a**2 + 2ab + b**2
B = (a - b)**2 = a**2 - 2ab + b**2
A - B = 4ab
ab = (A - B) / 4
requiring 3 one byte multiplications, an add, and a shift. Here a
is the high byte, and b is the low byte. There are some factors of
256 to consider in the rhs of the equation.

Another method that has worked out very well is to build a 8 by 16
bit multiplier, giving a 24 bit product, and call it twice with the
halves of one 16 bit operand. After which a 3 byte add and a carry
propagation gives you the full product. I used this effectively on
an 8080, where everything could be done in registers with a push
and pop or two between the 8*16->24 bit multiplications.

These are unsigned methods.

Some informative links:


Relevant Pages

  • Re: linear interpolation / Assembler / ATMega32
    ... > (arithmetic shift right, ASR). ... > There is still one more problem: rounding. ... > If you want to use this rounding algorithm with the sign branching ... > If you do not have a hardware multiplier, ...
  • Re: Why fixed point ?
    ... let's say a multiplier takes two 16 bit integers ... given fixed point format tells us what the implicit scaling is. ... need the shift because I expected the result in a different scaling ...
  • Re: floating point multiplier
    ... The simple answer is that it works like a 24-bit integer multiplier ... 24-bit significand). ... the left-most position (and with each shift, ... mov ebx, offset FLAT:a ...
  • Re: [RFC][PATCH 3/3] Try to convert non-trivial clocksources to clocksource_register_hz
    ... I chose a 1:16 shift ratio for the 1:27 input clocks. ... I'll probably add a printk just to see what shift, mult terms you compute. ... partly cuz it was easier to describe in a 1-line mod-desc, ... and add it in after the 1:27 multiplier. ...
  • Re: Extracting Digits from a Number
    ... three digit numbers. ... The multiplier is 41 and the starting shift is 12. ... The idea is that division by a power of 2 is cheaper than a generic ...