Re: No need to optimize in assembly anymore

From: Matt Taylor (para_at_tampabay.rr.com)
Date: 05/17/04


Date: Mon, 17 May 2004 21:05:50 +0000 (UTC)


"C" <cc-news@hermes.mirlex.com> wrote in message
news:zb2qc.88$uu2.24@newsfe2-gui.server.ntli.net...
> a wrote:
> > Given that optimizing in assembly for one processor will have
> > no effect (or negative effect) on a different processor it seems
> > that low level optimization is becoming a waste of time. As
> > processors become more sophisticated and diverse in the way they
> > execute code this trend is likely to continue.
> >
> > Anyone agree?
>
> Partially. If you are talking about cycle counting then yes,
> because these counts are non deterministic (due to out-of-order
> processing) and inconsistant across different processor
> generations and manifacturers (due to different goals in
> choosing hardware optimisations).
<snip>

Cycle-counting isn't quite that non-deterministic. The code usually falls
into the same cadence regardless of the initial state upon entry due to
dependencies. Cache misses are unpredictable, but there isn't really
anything you can do at that level to avoid them.

Code scheduling tends to improve performance across all architectures. Even
heavily pipelined machines like the Pentium-IV with massive capacity for
in-flight ops see improvement when poorly-scheduled code is optimized in
this fashion. Out-of-order processing helps to hide the differences between
CPUs, but it doesn't make a very good crutch.

Cycle-counting is also useful since most modern processors have similar
weaknesses and strengths. Multiplies & shifts are a classic example; convert
a constant divide to a constant multiply, and some constant multiplies will
convert into shifts. Pentium-IV is a little bit different, but otherwise x86
processors generally favor the same simple operations.

-Matt