Re: Extended-Precision Division
- From: Terje Mathisen <spamtrap@xxxxxxxxxx>
- Date: Mon, 4 Apr 2005 11:25:57 +0000 (UTC)
Matt wrote:
I am ultimately looking for a way to perform a 96-bit / 64-bit divide. This
is pretty simple to implement via a pair of 64-bit / 64-bit divides, so this
is the problem that I am focusing on at the moment. I know the basic
shifting implementation, but that is excruciatingly slow. I am estimating
that it would be upwards of 300 cycles for the 64-bit / 64-bit divide, so the full 96-bit / 64-bit divide would be around 600 cycles.
Isn't there a faster way?
'There is always a faster way'
(Probably written by Mike Abrash)
First, do you have a 64-bit cpu, like a recent AMD or Intel?
If so, the problem is trivial, since the x86 DIV opcode and register assignements are designed for exactly this sort of stuff:
(I don't have the 64-bit instruction manuals here, so I'll show the solution for a 48 / 32 divide using a 32-bit cpu)
; Divide a [48 bits] by b [32 bits], returning the result in EAX ; and the remainder in EDX. This will fail if the actual result ; doesn't fit in 32 bits!
movzx eax, word ptr a[4] xor edx,edx mov ebx,[b] div ebx ; At this point in time EDX contains the remainder, and EAX ; should hopefully be zero:
; mov [overflow],eax ; Keep this if you want an overflow indication mov eax, dword ptr a[0] div ebx
That's it!
Doing 64-bit divides on a 32-bit machine is a _lot_ harder, so several workarounds might be employed:
1) Solve it by integer DIVs using successive approximation:
a) Shift the divisor up so that it has no leading zero bits.
Remember the shift count
Worst case this doesn't save any bits at all.
b) Divide the top 64 (or 63 to avoid overflow) bits by this
32-bit number, then back-multiply using the full 64-bit
divisor, and subtract the result from the original
96-bit number.c) Repeat (b) one more time, and you should be done.
2) Do the same as (1), but start by calculating a reciprocal divisor that you can use to multiply by instead.
3) Load both the divisor and dividend into fp registers, calculate an approximate answer, then convert this back to a 64-bit integer value.
Do a single back-multiply and subtract to decide if you need to repeat the process, or if you can get away with a single-bit adjustment.
Terje -- - <Terje.Mathisen@xxxxxxxxxxxxx> "almost all programming can be viewed as an exercise in caching"
.
- References:
- Extended-Precision Division
- From: Matt
- Extended-Precision Division
- Prev by Date: Re: Programming with AMD?
- Next by Date: Re: Extended-Precision Division
- Previous by thread: Re: Extended-Precision Division
- Next by thread: Re: Extended-Precision Division
- Index(es):
Relevant Pages
|