Re: SSE2 register addition (linux gas)




Alexander Knopf wrote:
quick questions about the above.

i'm writing a program that handles large numbers, over 128 bits.
now i understand i could just put 4 DWord sections of any number into an
xmm register, put another 4 DWords into a second and add them.
however, i don't know how the carry flag is handled.
so questions as follows:
1) which byte order would i have to use ?
2) would the carry bit be added to the next higher dword ?
3) what happens if the highest dword addition produces a carry ?

any help is greatly appreciated.

-Alexander Knopf


The short answer is: They will add the four dwords completely
independantly, and the carry flag goes into the EFLAGS register, if I
remember correctly. SSE instructions are NOT meant to operate on
128-bit data types. They are meant to work on, up to, qwords. Doing
higher than qword math in SSE registers correctly is not obvious and
generally done incorrectly.

Longer answer:
There is a horizontal component associated with this (the carry flag).
Horizontal, in the sense, that the result of the operation has
dependancies on adjacent data elements in the same register (as opposed
to vertical, which would be corresponding elements in a different
register). SSE works best when you confine all dependancies to the
vertical variety. This is possible with, for instance, 128-bit adds,
but it's fairly difficult to pull off correctly. Since you can use 2
qwords (which is better than 4dwords).. it involves calculating 2
128-bit additions at the -same- time.

If you had two matricies (or vectors) A and B of 128-bit numbers you
wanted to add... and we say a1h and a1l are the high and low qwords of
element a1 (128-bit number).

xmm1 = [a1l : a2l]
xmm2 = [b1l : b2l]
store(xmm1 + xmm2)
xmm3 = [a1h : a2h]
xmm4 = [b1h : b2h]
figure out carry flags for h term, add it
store

In this method, we are calculating -two- elements of our matrix at the
same time. And all our dependancies are vertical, so that all
information needed to add a1 and b1 occurs in the left half of the SSE
register, and all information needed to add a2 and b2 occur in the
right half.

.



Relevant Pages

  • Re: Bad habits
    ... mov dword,8 ... into a register and rotate within the register than ... not eax ...
    (alt.lang.asm)
  • Re: How to inform application (attach/detach device)
    ... > You can register for PnP notifications. ... As I need to get informed about a logical COM port, ... DWORD dbcp_devicetype; ... "friendly name" in a TCHAR array of size 1??? ...
    (microsoft.public.development.device.drivers)
  • Re: hexadecimal addition
    ... I have a register lets say F0200088 ... become B0 in hex) ... On what system is a dword 5.57143 bytes long? ... How can I do hexa addition in perl to get this value of F02000B0. ...
    (comp.lang.perl.misc)
  • Re: KB960715 blocks msflxgrd.ocx in VBA
    ... After restart the ... the DWORD in register is set to 400 hexa. ...
    (microsoft.public.windowsupdate)
  • Re: SSE2 register addition (linux gas)
    ... would the carry bit be added to the next higher dword? ... independantly, and the carry flag goes into the EFLAGS register, if I ... They are meant to work on, up to, qwords. ...
    (comp.lang.asm.x86)