Re: Do "nops" add to the latency of a program?




Trev wrote:
Hi all,
My compiled program has had a few nops put in the listing, and I've
noted a few places where nops could be used, eg:

push 5
pop edi
cmp eda, edi

which I can shorten to cmp eda, 5.
Does the inclusion of nops slow the program down? Does the computer
skip over them, or
does it still "process" them, even if they do nothing?


They may or may not. On some processors (especially less capable
ones), they always took some time, on many of the faster processors,
they can often slot in between other instructions, and can effectively
take no time.

On some processors they're actually executed as instructions (for for
example, on x86 a nop is actually an "xchg ax,ax", and on earlty
version was actually executed). Many processors do short circuit that
(for example, on most modern x86s the "xchg ax,ax" sequence will run a
fair bit faster than "xchg bx,bx"), and eat them somewhere early in the
decode process.

So, in short, they always have to get decoded, which may or may not
impact the execution of you program, but they may or may not actually
consume execution resources, which again, may or may not impact the
execution of your program.

In general you want to minimize nops (if nothing else, they take space
in the instruction cache). *But* they may be useful or necessary to
align things for faster execution. For example, its often useful on
x86 to align branch targets on 16 byte boundaries, which can be
achieved by putting nops in front of the label. On some processors you
can optimize the decode process by grouping instructions in a certain
way. For example, a RISC CPU might be able to decode/dispatch four
instructions per cycle, but only two of class X without going through a
"slow" special case decoder. If you had three class X instructions
shoving one into the preceding or next group of instruction may be
beneficial, and may leave you with an empty instruction slot to fill.

Note that you should always use the canonical form of nop for your
processor. Most ISAs have many effective nops. For example, on x86,
most xchg's of a register with itself, a mov of a register to itself, a
pair of cmc instructions, carefully coded bound instructions, jumps,
conditional or not, to the next instruction, a pair of bswaps, certain
lea forms, plus a bunch of forms if you already know something of the
CPU state (for example, if you know overflow flag is clear into is a
nop), as well as a bunch of other forms, are all effective nops. But
only the "official" form is like to have short circuit processing in
the CPU, the other forms will mostly run as "normal" instructions, that
happen not to accomplish anything.


http://groups.google.com/group/comp.lang.asm.x86/browse_frm/thread/8059edafb44b200d/f00b235e71c27191?lnk=st&q=robertwessel2+nop&rnum=1&hl=en#f00b235e71c27191

.



Relevant Pages

  • Instruction pipe line delay?
    ... compiler for following pipeline processor arch. ... And its resonably easy to schedule instructions to ... minimize NOPs when we are doing assembler codes. ... Memory Load instruction, two delay slots ...
    (comp.compilers)
  • Re: [BUG] x86 kenel wont boot under Virtual PC
    ... the paravirt_ops patching uses multibyte nops to pad out the ... 32-bit aren't 64-bit instructions at all, so we do want different nops ... The whole static choice by microarchitecture is pure garbage. ...
    (Linux-Kernel)