Re: Optimization
From: Paul Hsieh (qed_at_pobox.com)
Date: 02/11/04
- Next message: Richard Heathfield: "Re: C Please help me learn"
- Previous message: Michael N. Christoff: "Re: Why C# and Java have got it wrong"
- In reply to: Martin Eisenberg: "Re: Optimization"
- Next in thread: Martin Eisenberg: "Re: Optimization"
- Reply: Martin Eisenberg: "Re: Optimization"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 10 Feb 2004 22:05:41 -0800
Martin Eisenberg wrote:
> Paul Hsieh wrote:
> > So the computation, believe it or not, takes either 1 clock, or
> > 16 clocks depending on the success or failure of the branch
> > prediction. Assuming a 50% prediction rate this works out as (1
> > + 16)/2 = 8.5 clocks.
>
> Is the 50% assumption the best we can do without diving deep into the
> specifics of any particular call site?
No, 50% is in fact the statistically worst performance of the
predictor. I just picked it as an example. For the predictor to
perform well, the sequence of branch directions has to either follow a
short pattern, or a have a probabilistic bias (i.e., 90% taken, versus
10% non-taken, will tend to be predicted fairly well regardless of the
pattern the branches come in.)
Very often the branch is *very* predictable. For example if you want
to find the minimum element of a very large array that's randomly
sorted, then the predictor will quickly lean very heavily to assuming
that each successor is not the new minimum, and after roughly (n / e)
(where e = exp(1) = Napier's constant) elements on average the predict
will lock correctly.
In those cases you can just weight the two possibilites (well
predicted versus not) according to the probability of your branch.
> > Assuming a previous generation compiler (like MSVC):
>
> Is VC 7.x "previous generation" as well? I hear its optimizer is much
> improved over version 6.
Who did you "hear" this from? Microsoft marketing perhaps? Look,
Intel has *EMBARASSED* Microsoft with its truly amazing compiler. MS
is also starting to feel pressure from gcc which has also improved by
leaps and bounds in the past 5 years. I'm sure they have been working
on their compiler and have convinced themselves that they are the
greatest thing since slice bread, but Intel has left them (and
everyone else) so far behind its not funny. (Intel has spent a small
fortune in hiring the absolute best compiler creators in the industry.
Remember that Intel doesn't rely on the revenues from their compiler
to stay afloat. So Microsoft cannot apply any kind of competitive
pressure to make Intel stop.)
I have not used VC 7.x personally, so I cannot say anything
authoritative about that compiler. But previous versions did not emit
cmovCC or any of the other post Pentium instructions outside of inline
assembly.
> > The P6/Athlon CPUs support conditional move instructions like
> > "cmovl" which will directly translate flag results to a kind of
> > ?: operation.
>
> Ah, so I've actually misremembered my processor's age. I don't see a
> full instruction manual at AMD's documentation site,
Oh they've got one somewhere around there.
> [...] but I dare infer
> from your comment in conjunction with Intel's reference that the
> Athlon also has FP conditional moves.
They have FCOMI, but I don't remember about other new FP instructions.
AMD has some kind of capabilities bits program and associated
documentation somewhere that you can use to test their presence with.
> > When you are in the floating point world, and using a processor
> > like the Athlon which takes time to communication between the
> > integer and FP parts of the CPU, then the situation is just a
> > little more murky.
>
> I guess that's "a little" as in the "little bit" you know about
> optimization, the extent of which your quite interesting site
> reveals ;) By the way, how do FCOMI and relatives impact that
> situation?
An AMD insider informed that they put some significant work into FCOMI
and that its supposed to be fairly fast. From looking at the
disassembly that you showed in another post it looks like one of your
solutions uses such instructions to avoid the transition to the
integer-side of the CPU entirely. If that's the case, then that would
be yet another case that would need to be looked at.
-- Paul Hsieh http://www.pobox.com/~qed/ http://bstring.sf.net/
- Next message: Richard Heathfield: "Re: C Please help me learn"
- Previous message: Michael N. Christoff: "Re: Why C# and Java have got it wrong"
- In reply to: Martin Eisenberg: "Re: Optimization"
- Next in thread: Martin Eisenberg: "Re: Optimization"
- Reply: Martin Eisenberg: "Re: Optimization"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|