Re: SSE2 register addition (linux gas)
- From: "ldb" <spamtrap@xxxxxxxxxx>
- Date: 29 Aug 2006 13:27:28 -0700
Alexander Knopf wrote:
ldb wrote:
Alexander Knopf wrote:
quick questions about the above.
i'm writing a program that handles large numbers, over 128 bits.
now i understand i could just put 4 DWord sections of any number into an
xmm register, put another 4 DWords into a second and add them.
however, i don't know how the carry flag is handled.
so questions as follows:
1) which byte order would i have to use ?
2) would the carry bit be added to the next higher dword ?
3) what happens if the highest dword addition produces a carry ?
any help is greatly appreciated.
-Alexander Knopf
The short answer is: They will add the four dwords completely
independantly, and the carry flag goes into the EFLAGS register, if I
remember correctly. SSE instructions are NOT meant to operate on
128-bit data types. They are meant to work on, up to, qwords. Doing
higher than qword math in SSE registers correctly is not obvious and
generally done incorrectly.
so basically what you're saying is, since i can do addition of 2 QWords,
and there's only one carry flag i would have to check the values anyhow.
now then, would it be easier to use 32 bit registers and use adc to add
the carry flag to the next value ?
-Alexander Knopf
Actuall the intel manual on the PADDQ (add packed quadword integers)
instruction says this:
"When a quadword result is too large to be represented in 64 bits
(overflow), the result is wrapped around and the low 64 bits are
written to the destination element (that is, the carry is ignored)....
... however, it does not set bits in the EFLAGS register to indivate
overflow and/or a carry. To prevent undetected overflow conditions,
software must control the ranges of the values operated on."
The moral of the story here is that SSE is very bad for multiprecision
arithmetic operations. People often come in here and want to use, for
instance, 128-bit integers, and believe that XMM registers (which are
128 bit) are the silver bullet. It turns out, in fact, they are not.
If you need to add 2 pairs (ie a+b and c+d) of 128-bit numbers, then
SSE -can- work with some trickery with some benefit (you do a+b in the
left 64 bits, and c+d in the right 64 bits)... but to do a single 128
bit add, it is -much- easier to just do it in normal 32-bit registers
with adc and add.
Now, if you add to two giant vectors of 128-bit integers, with correct
programming, the SSE would probably outperform a single 32-bit
solution. The operative word there is 'probably'. It's more complicated
than just doing a for loop over each element, however. The gist of the
solution is you need to add pairs of elements simultaneously.
.
- References:
- SSE2 register addition (linux gas)
- From: Alexander Knopf
- Re: SSE2 register addition (linux gas)
- From: ldb
- Re: SSE2 register addition (linux gas)
- From: Alexander Knopf
- SSE2 register addition (linux gas)
- Prev by Date: AoA setup
- Next by Date: Re: Could not switch back to Real-Address mode from Protected Mode. Help?
- Previous by thread: Re: SSE2 register addition (linux gas)
- Next by thread: newbie questions text code...
- Index(es):
Relevant Pages
|
|