# Re: sw multiply

Bill Davy wrote:
"Charles Marslett" <c-deletethis-marslett@xxxxxxxxx> wrote in message
"ivan" <wolfsafety@xxxxxxxxxxxxx> wrote:

I am very tight on time here. I have lots of long moltiplications
to do in e very short time. I needed to roughly estimate how
wastefull would be a moltiplication without a HW multiplier. But
found out that the safest option (with HW multiplier) is also the
cheapest (at least using PICs).

Depending on the architecture, and assuming you have space for a
table of arguments, the following pseudocode can make a really
fast software multiplier:

mul(a, b)
return (sqr(a+b)-(sqr[a] + sqr[b])) >> 1

It costs a table of squares from the minimum argument value to the
sum of the two largest argument values, though, and that can eat a
lot of memory. Otherwise, it involves 3 table lookups, 2 adds, 2
subtracts and a right shift.

And if (a+b) goes over the (let's say byte) boundary and so makes
indexing hard, then (d+256)^2 is "easily" computed with some
simples sums too: sqr(d) + 2.256.d + 1.256.256 where d = (a+b) %
256 (and this only happens about a quarter of the time)

Or, once you have a byte multiplier, you can use the relationship:

A = (a + b)**2 = a**2 + 2ab + b**2
B = (a - b)**2 = a**2 - 2ab + b**2
so
A - B = 4ab
i.e.
ab = (A - B) / 4
requiring 3 one byte multiplications, an add, and a shift. Here a
is the high byte, and b is the low byte. There are some factors of
256 to consider in the rhs of the equation.

Another method that has worked out very well is to build a 8 by 16
bit multiplier, giving a 24 bit product, and call it twice with the
halves of one 16 bit operand. After which a 3 byte add and a carry
propagation gives you the full product. I used this effectively on
an 8080, where everything could be done in registers with a push
and pop or two between the 8*16->24 bit multiplications.

These are unsigned methods.

--
<news:news.announce.newusers
<http://www.geocities.com/nnqweb/>
<http://www.catb.org/~esr/faqs/smart-questions.html>
<http://www.caliburn.nl/topposting.html>
<http://www.netmeister.org/news/learn2quote.html>

.

## Relevant Pages

• Re: linear interpolation / Assembler / ATMega32
... > (arithmetic shift right, ASR). ... > There is still one more problem: rounding. ... > If you want to use this rounding algorithm with the sign branching ... > If you do not have a hardware multiplier, ...
(comp.arch.embedded)
• Re: Why fixed point ?
... let's say a multiplier takes two 16 bit integers ... given fixed point format tells us what the implicit scaling is. ... need the shift because I expected the result in a different scaling ...
(comp.dsp)
• Re: floating point multiplier
... The simple answer is that it works like a 24-bit integer multiplier ... 24-bit significand). ... the left-most position (and with each shift, ... mov ebx, offset FLAT:a ...
(comp.lang.asm.x86)
• Re: [RFC][PATCH 3/3] Try to convert non-trivial clocksources to clocksource_register_hz
... I chose a 1:16 shift ratio for the 1:27 input clocks. ... I'll probably add a printk just to see what shift, mult terms you compute. ... partly cuz it was easier to describe in a 1-line mod-desc, ... and add it in after the 1:27 multiplier. ...
(Linux-Kernel)
• Re: [RFC][patch 10/12] move NTP adjusted clock multiplier to struct timekeeper
... The clocksource structure has two multipliers, ... Add the mult field to the struct ... timekeeper to contain the NTP corrected value, ... add the shift value associated with the NTP corrected mult value to ...
(Linux-Kernel)