Influence of local code modifications



Hi,

Maybe you have some ideas how to cope with this problem:

I'm trying to optimize assembler code for a complex DSP which is an
in-order-issue superscalar processor (integer and load/store
pipeline,i.e. optimally two instructions can be issued on the two
pipelines simultaneously). Before it decodes instructions
(which might be 16- or 32-bit), they are fetched into a 64bit fetch
buffer. Thus, the fetch is aligned to 8byte addresses and in case an
instruction at a branch target goes beyond the 8byte-border
(misalignment), the processor stalls for additional cycles (extra transfer
of control penalties). The DSP supports also an instruction cache which
makes things even more complicated since multiple instructions are read
from the cache and again they might span over multiple lines leading to
extra cycles.

My optimizations deal with moving basic blocks (determined by some
cost functions) from the slow main memory to a small but fast memory
thus allowing a fast access to these particular blocks. However, I
have large problems with the "optimized" code. The moved blocks
benefit from the faster memory but due to the moving the addresses of
the subsequent instructions obviously change. Sometimes it's even
sufficient to add one instruction which modifies the address of the
following code to get significant runtime changes. The reason are new
misaligned jump targets, differently loaded fetch buffers and thus
different filling of the superscalar pipeline which might have a
positive or negative effect on the total program runtime.

Thus, my problem is that I can achieve a local optimization for the
moved blocks but the resulting global influence is not predictable and
might even undo the benefits and even result in a degraded runtime of
the program.

How do compiler developers cope with this problem? Are there any
approaches which allow to predict the influence of a local code
optimization on the global code performance for complex processors?

Regards,
Tim
.



Relevant Pages

  • Prediction of local code modifications
    ... things even more complicated since multiple instructions are read from ... The moved blocks ... different filling of the superscalar pipeline which might have a ... my problem is that I can achieve a local optimization for the ...
    (comp.compilers)
  • Influence of local code modifications
    ... I'm trying to optimize assembler code for a complex DSP which is an ... optimally two instructions can be issued on the two ... The moved blocks ... my problem is that I can achieve a local optimization for the ...
    (comp.dsp)
  • Code optimization (was Re: String and Char Help)
    ... with a machine language CALL instruction, then call gates that result in task switches and other overhead issues involving stacks, various descriptor tables, etc., certainly can potentially cause a degree of serious performance hits with repeated use. ... To implement a switch statement in assembler requires conditional branching instructions, which may or may not have inherent overhead issues depending on the architecture. ... understanding the "bigger picture" of what an entire application is designed for is a very helpful aid in determining which areas are the best candidates for optimization. ...
    (comp.lang.java.programmer)
  • Re: Optimization (was Re: C portability is a myth)
    ... your compiler is converting your code into inefficient instructions, ... > optimization (which I wrote about prior to the existence of the Athlon ... that ordering instructions in certain ways will always lead to faster ... Until you know where the bottlenecks are, ...
    (comp.lang.c)
  • Re: Sanity check: 8k enough flash for small project
    ... The following code, with gcc on the ... >> initialization, 8 if you also insist on one statement per line. ... > code without optimization and 70 bytes with the highest level of ... most of the instructions generated ...
    (comp.arch.embedded)