Re: Efficiency



Lods ans stosb use many clock of computer
a mov ax,si use less time
So, it use more bytes, but it is more speed.


"Dirk Wolfgang Glomp" <dirk@xxxxxxxxxxxxxxxxxxx> a écrit dans le message de
news: 1kjl6gfgwyb8c$.16mxth2njz8mc.dlg@xxxxxxxxxxxxx
Am Fri, 26 Oct 2007 09:49:20 -0700 schrieb KJH:

Let's say I want to write compact and fast code.

I'm thinking about something like this:

; proc returns ZF set if char is 'a' or 'z'
myproc proc
cmp al,'a'
je @@1
cmp al,'z'
@@1: ret
myproc endp

compared to C version:

int myfunc(int c)
{
if (c == 'a' || c == 'z')
return 1;
return 0;
}


What is the penalty? Okay, these are very basic and small routines,

Yes, i never use subroutines only for a few bytes of code.

but I guess that modern C compilers are more aware about optimal
alignment

Shure, more than older C compilers.

and also it seems that modern Pentiums and above really like
their instructions more RISC-like (aka boring instr set).

Nice to generate a blended code to speedup many different CPUs.

I am used to instructions like lodsb, stosb, inc, dec etc... I'm
adapted to that kind of mental image, lodsb for example in my mind is
a very convenient instruction to load a byte from memory pointed by
DS:ESI. I can visualize it. Also loop instruction is nice, it's
compact (in a conceptual way), but I don't know...

Why is it so that nowadays:

mov eax,[esi]
mov [edi],eax

is more efficient than

lodsd
stosd

?

If these special registers are not free to use it for lods/stos/movs or
loop -instructions, they must be store and reload. Else these values
will be loose. So it is sometimes easier to use other register to do
this job without those depencys.

ABOVE:
mov eax,[ecx]
mov [edx],eax
dec esi
jnz ABOVE

I just don't get it.


But to my actual question, is it likely that I'm going to take a speed
hit or cache misses by using these old asm constructs vs. more modern
C-like constructs? Code compactness doesn't necessarily translate to
speed, no?


Hopefully somebody can make some sense what I'm talking about, I'm not
sure if I express myself clearly :)

You can recieve all these stuff to optimize a code from intel/amd.

Like "248966.pdf"
Intel® 64 and IA-32 Architectures
Optimization Reference Manual

Dirk


.



Relevant Pages

  • [PATCH -tip -v13 01/11] x86: instruction decoder API
    ... This version introduces instruction attributes for decoding instructions. ... The instruction attribute tables are generated from the opcode map file ... +88: MOV Eb,Gb ... +e0: LOOPNE/LOOPNZ Jb (f64) ...
    (Linux-Kernel)
  • [PATCH -tip v14 01/12] x86: instruction decoder API
    ... This version introduces instruction attributes for decoding instructions. ... The instruction attribute tables are generated from the opcode map file ... +88: MOV Eb,Gb ... +e0: LOOPNE/LOOPNZ Jb (f64) ...
    (Linux-Kernel)
  • [PATCH 01/18] x86: Instruction decoder API
    ... Add x86 instruction decoder to arch-specific libraries. ... The instruction attribute tables are generated from the opcode map file ... +88: MOV Eb,Gb ... +e0: LOOPNE/LOOPNZ Jb (f64) ...
    (Linux-Kernel)
  • [PATCH -tip v6 1/5] x86: instruction decorder API
    ... This version introduces instruction attributes for decoding instructions. ... The instruction attribute tables are generated from the opcode map file ... +88: MOV Eb,Gb ... +e0: LOOPNE/LOOPNZ Jb (f64) ...
    (Linux-Kernel)
  • [PATCH -tip v6.1 1/5] x86: instruction decorder API
    ... This version introduces instruction attributes for decoding instructions. ... The instruction attribute tables are generated from the opcode map file ... +88: MOV Eb,Gb ... +e0: LOOPNE/LOOPNZ Jb (f64) ...
    (Linux-Kernel)