Re: Combining two MMX registers into one SSE register?




randyhyde@xxxxxxxxxxxxx wrote:
> robertwessel2@xxxxxxxxx wrote:
>
> >
> > Win64/x86 does not fully save the x87 state, so you cannot use the x87
> > registers and instructions, hence no 80-bit hardware types.
>
> Though I've heard this for quite some time, I've also read that this
> was an urban legend that resulted from the fact that Microsoft's own
> compilers don't use the FPU and, hence, the documentation makes no
> mention of the FPU (so the assumption being that it was not preserved).
> I read somewhere (and the source escapes me now) that *not* saving the
> FPU state on a context switch leads to some security problems (don't
> ask me which ones, I don't know) and, therefore, MS *has* to preserve
> the FPU state.
>
> So the question I have for you is this: "are you *sure* Win64 doesn't
> preserve the FPU state? Or are you just repeating the 'rumor' that has
> gone around in the past?" If you've got some definitive information on
> this, I'd be interested in hearing about it (from a reasonable source,
> of course). Based on the article I read, which referenced the
> FPU-is-not-saved issue, I'm more prone to believe that the FPU state
> *is* saved across context switches. After all, considering how much
> effort it is to *partially* save the FPU state, why not just save it
> all? And if the state is getting preserved in the "32-bit
> environment", why wouldn't it also be saved in the 64-bit environment?
> As you pointed out, the FPU still does many things that cannot be done
> (easily) with SSE.


Well, on doing some more research, it turns out you're (partially)
right. Win64 does preserve the x87 context on context switches. There
were several places in the early documentation for Win64 that
explicitly said otherwise, however. Here's an example:

"Legacy Floating-Point Support - The MMX and floating-point stack
registers (MM0-MM7/ST0-ST7) are volatile. That is, these legacy
floating-point stack registers do not have their state preserved across
context switches."

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/kmarch/hh/kmarch/64bitAMD_128662cf-fa29-443b-a61e-d9576e48c7f4.xml.asp

It turns out that this doc confuses kernel and user threads. Kernel
thread save *no* FP state on context switches (and never have), user
thread context switches do.

There was also a supporting document from AMD that said much the same
thing.

Even the early version of 64-bit MASM issued "Error A2222: x87 and MMX
instructions disallowed; legacy FP state not saved in Win64" when you
specified and an x87 instruction, so even MS was confused on this
issue.

Some more recent documentation does, in fact, correct that, although I
can't actually find it online at MSDN (it is included with the
compiler).

The "partially" comes from the fact that the x87 state is *not* handled
in any particular way across function calls. So while you can use the
x87 registers, there's no telling what might come back in them after a
function call. So they're usable, but within a limited context.

In short, I saw all that in the early doc, and have never before cared
to verify it.

.



Relevant Pages

  • Re: Whats gonna happen to "extended"?
    ... The issue has finally been resolved with the long overdue publication of a more detailed ABI for x64 Windows in the form of a document entitled "x64 Software Conventions", well hidden in the bin directory of some compiler packages. ... "The MMX and floating-point stack registers are preserved across context switches. ... The floating point registers must be supported when running legacy 32-bit ...
    (borland.public.delphi.non-technical)
  • Re: Quagga as border router
    ... makes for more context switches. ... 64bit registers are twice as large as ... there is no reason why I couldn't receive full routes ... My current router couldn't handle the full route table, ...
    (freebsd-net)
  • Re: Disadvantages and advantages of condition code register and general-purpose register
    ... it frees physical registers for register renaming. ... it could also speed up context switches as there ... issue multiple reads to your multiple load/store units in parallel, ... I suppose that it depends on the number of architected registers, ...
    (comp.arch)
  • Re: 64-bit architecture
    ... There are still only 16 general-purpose registers. ... Real CPUs have at ... More registers obviously means slightly slower context switches. ...
    (comp.os.linux.development.system)