Re: MOVNTQ on Athlon

From: Robert Redelmeier (redelm_at_ev1.net.invalid)
Date: 05/31/04


Date: Mon, 31 May 2004 02:44:04 +0000 (UTC)

CHK <b@b.biz> wrote:
> This code also runs on dual Athlon MP system, but looks like the Athlon is
> choking on this instruction. MOVNTQ takes lot more cycles on Athlon compared
> to Pentium-III. In fact I get better speed on Athlon when I use MOVQ.

Odd. I use `movntq` on Althon and get good speed.

Of course, you only use it for writes, not reads. Reads need
the cache. With writes, you want to avoid the needless line
load before modify iff your going to modify the complete line.

-- Robert



Relevant Pages

  • Re: MOVNTQ on Athlon
    ... > This code also runs on dual Athlon MP system, but looks like the Athlon is ... MOVNTQ takes lot more cycles on Athlon ... Check page 71 of the AMD Athlon Optimization Manual. ...
    (comp.lang.asm.x86)
  • Re: MOVNTQ on Athlon
    ... > III in which I use MOVNTQ instruction quite often. ... > prevent cache pollution. ... > Athlon is choking on this instruction. ...
    (comp.lang.asm.x86)
  • Re: MOVNTQ on Athlon
    ... > I have a bunch of image processing functions optimized for Pentium III in ... > which I use MOVNTQ instruction quite often. ... > This code also runs on dual Athlon MP system, but looks like the Athlon is ... MOVNTQ takes lot more cycles on Athlon compared ...
    (comp.lang.asm.x86)
  • Re: MOVNTQ on Athlon
    ... >> This code also runs on dual Athlon MP system, ... I use `movntq` on Althon and get good speed. ... > load before modify iff your going to modify the complete line. ... The movntq instruction can't even encode as a read. ...
    (comp.lang.asm.x86)
  • Re: Opteron versus P4
    ... Athlon has a three-way fully pipelined FPU. ... micro-benchmarks with the x87 instruction set, ... down its FP execution units at a rate of one per clock. ... the Athlon can achieve twice the execute ...
    (borland.public.delphi.language.basm)