Re: No need to optimize in assembly anymore
From: Matt Taylor (para_at_tampabay.rr.com)
Date: 05/17/04
- Next message: Bryan Parkoff: "Re: DirectX in HLA"
- Previous message: The Passer-by: "Re: Emulating FPU"
- In reply to: C: "Re: No need to optimize in assembly anymore"
- Next in thread: C: "Re: No need to optimize in assembly anymore"
- Reply: C: "Re: No need to optimize in assembly anymore"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Mon, 17 May 2004 21:05:50 +0000 (UTC)
"C" <cc-news@hermes.mirlex.com> wrote in message
news:zb2qc.88$uu2.24@newsfe2-gui.server.ntli.net...
> a wrote:
> > Given that optimizing in assembly for one processor will have
> > no effect (or negative effect) on a different processor it seems
> > that low level optimization is becoming a waste of time. As
> > processors become more sophisticated and diverse in the way they
> > execute code this trend is likely to continue.
> >
> > Anyone agree?
>
> Partially. If you are talking about cycle counting then yes,
> because these counts are non deterministic (due to out-of-order
> processing) and inconsistant across different processor
> generations and manifacturers (due to different goals in
> choosing hardware optimisations).
<snip>
Cycle-counting isn't quite that non-deterministic. The code usually falls
into the same cadence regardless of the initial state upon entry due to
dependencies. Cache misses are unpredictable, but there isn't really
anything you can do at that level to avoid them.
Code scheduling tends to improve performance across all architectures. Even
heavily pipelined machines like the Pentium-IV with massive capacity for
in-flight ops see improvement when poorly-scheduled code is optimized in
this fashion. Out-of-order processing helps to hide the differences between
CPUs, but it doesn't make a very good crutch.
Cycle-counting is also useful since most modern processors have similar
weaknesses and strengths. Multiplies & shifts are a classic example; convert
a constant divide to a constant multiply, and some constant multiplies will
convert into shifts. Pentium-IV is a little bit different, but otherwise x86
processors generally favor the same simple operations.
-Matt
- Next message: Bryan Parkoff: "Re: DirectX in HLA"
- Previous message: The Passer-by: "Re: Emulating FPU"
- In reply to: C: "Re: No need to optimize in assembly anymore"
- Next in thread: C: "Re: No need to optimize in assembly anymore"
- Reply: C: "Re: No need to optimize in assembly anymore"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]