# Re: sw multiply

Bill Davy wrote:
"Charles Marslett" <c-deletethis-marslett@xxxxxxxxx> wrote in message
"ivan" <wolfsafety@xxxxxxxxxxxxx> wrote:

I am very tight on time here. I have lots of long moltiplications
to do in e very short time. I needed to roughly estimate how
wastefull would be a moltiplication without a HW multiplier. But
found out that the safest option (with HW multiplier) is also the
cheapest (at least using PICs).

Depending on the architecture, and assuming you have space for a
table of arguments, the following pseudocode can make a really
fast software multiplier:

mul(a, b)
return (sqr(a+b)-(sqr[a] + sqr[b])) >> 1

It costs a table of squares from the minimum argument value to the
sum of the two largest argument values, though, and that can eat a
lot of memory. Otherwise, it involves 3 table lookups, 2 adds, 2
subtracts and a right shift.

And if (a+b) goes over the (let's say byte) boundary and so makes
indexing hard, then (d+256)^2 is "easily" computed with some
simples sums too: sqr(d) + 2.256.d + 1.256.256 where d = (a+b) %
256 (and this only happens about a quarter of the time)

Or, once you have a byte multiplier, you can use the relationship:

A = (a + b)**2 = a**2 + 2ab + b**2
B = (a - b)**2 = a**2 - 2ab + b**2
so
A - B = 4ab
i.e.
ab = (A - B) / 4
requiring 3 one byte multiplications, an add, and a shift. Here a
is the high byte, and b is the low byte. There are some factors of
256 to consider in the rhs of the equation.

Another method that has worked out very well is to build a 8 by 16
bit multiplier, giving a 24 bit product, and call it twice with the
halves of one 16 bit operand. After which a 3 byte add and a carry
propagation gives you the full product. I used this effectively on
an 8080, where everything could be done in registers with a push
and pop or two between the 8*16->24 bit multiplications.

These are unsigned methods.

--