Re: Optimizing Assembler Code



On 14 Jun., 15:02, Frank <spamt...@xxxxxxxxxx> wrote:
The problem is that when you have to access a variable you are
stalling
the instruction fetch pipeline and the speculative execution of modern
CPU's.
If you have a case which is mapped to binary search jumps (and only
larger
switcher with dense labels are mapped into jump tables) then all this
modern
features are working well.


What does this mean: "you are stalling the fetch pipeline"?
and what are the modern features that apply when one does
a binary search jump? Could you give me a reference or elaborate on
this?

Thanks a lot.

Frank


On 14 Jun., 20:02, scholz.lothar <spamt...@xxxxxxxxxx> wrote:
On 14 Jun., 15:02, Frank <spamt...@xxxxxxxxxx> wrote:

using 'computed goto' (goto *variable) is slower than routing via
a switch statement (switch(variabel) { case 1: goto LABLE_1; ... })

I would need some support by someone who has excessive assembler
experience, in order to understand what it going on.

How many labels do you have in the switch statement?

The problem is that when you have to access a variable you are
stalling
the instruction fetch pipeline and the speculative execution of modern
CPU's.
If you have a case which is mapped to binary search jumps (and only
larger
switcher with dense labels are mapped into jump tables) then all this
modern
features are working well.

This is one of the reasons for SmartEiffel to be faster then C++. It's
using
unfolded if's instead of virtual method tables for virtual function
calls.

Remember that in the time of a secondary cache level miss you have
200
cycles to wait and can do 400-600 operations. Thats a lot.

For everything else use VTune to got into details.

.