Re: SSE2 register addition (linux gas)
- From: "ldb" <spamtrap@xxxxxxxxxx>
- Date: 25 Aug 2006 06:33:07 -0700
Alexander Knopf wrote:
quick questions about the above.
i'm writing a program that handles large numbers, over 128 bits.
now i understand i could just put 4 DWord sections of any number into an
xmm register, put another 4 DWords into a second and add them.
however, i don't know how the carry flag is handled.
so questions as follows:
1) which byte order would i have to use ?
2) would the carry bit be added to the next higher dword ?
3) what happens if the highest dword addition produces a carry ?
any help is greatly appreciated.
-Alexander Knopf
The short answer is: They will add the four dwords completely
independantly, and the carry flag goes into the EFLAGS register, if I
remember correctly. SSE instructions are NOT meant to operate on
128-bit data types. They are meant to work on, up to, qwords. Doing
higher than qword math in SSE registers correctly is not obvious and
generally done incorrectly.
Longer answer:
There is a horizontal component associated with this (the carry flag).
Horizontal, in the sense, that the result of the operation has
dependancies on adjacent data elements in the same register (as opposed
to vertical, which would be corresponding elements in a different
register). SSE works best when you confine all dependancies to the
vertical variety. This is possible with, for instance, 128-bit adds,
but it's fairly difficult to pull off correctly. Since you can use 2
qwords (which is better than 4dwords).. it involves calculating 2
128-bit additions at the -same- time.
If you had two matricies (or vectors) A and B of 128-bit numbers you
wanted to add... and we say a1h and a1l are the high and low qwords of
element a1 (128-bit number).
xmm1 = [a1l : a2l]
xmm2 = [b1l : b2l]
store(xmm1 + xmm2)
xmm3 = [a1h : a2h]
xmm4 = [b1h : b2h]
figure out carry flags for h term, add it
store
In this method, we are calculating -two- elements of our matrix at the
same time. And all our dependancies are vertical, so that all
information needed to add a1 and b1 occurs in the left half of the SSE
register, and all information needed to add a2 and b2 occur in the
right half.
.
- Follow-Ups:
- Re: SSE2 register addition (linux gas)
- From: Alexander Knopf
- Re: SSE2 register addition (linux gas)
- References:
- SSE2 register addition (linux gas)
- From: Alexander Knopf
- SSE2 register addition (linux gas)
- Prev by Date: Re: 64bit/64bit fixed point division?
- Next by Date: Re: Suggestion for some good ASM books?
- Previous by thread: Re: SSE2 register addition (linux gas)
- Next by thread: Re: SSE2 register addition (linux gas)
- Index(es):
Relevant Pages
|