Re: Clock cycles list - help



On Mar 9, 7:53 am, "Guga" <Guga...@xxxxxxxxx> wrote:
On Mar 8, 8:46 pm, "Wolfgang Kern" <nowh...@xxxxxxxx> wrote:



Hallo Guga,

Tks.. robert.. i think i got it..
Assuming i´m using them continuosly, i made a simple formula that
shows the amount of clock cycles of those instructions used on such a
way (continuosly)
Clocks = Latency+(Throughput*N-1)
N = Amount of instructions used (all of the same type), like the 1000
example you gave.
Latency of the mnemonic
Throughput of the mnemonic.
Clocks = total amount of clocks of the sequence of the mnemonics used
continuosly
Is that it ?

Would work also if throughput is <1
which means several instructions may perform in parallel.

There is a timing calculation example in my AMD-docs ...
This formula spans half a page and is impractical for daily usage,
so I just use the lists as prepared by AMD:

|Instruction-group |Latency |Throughput |affected PIPES|

Intel got similar lists, but I also missed SSE-timing there.

I once had timing information in my x86-disassembler,
but it used latency values only.
As this wasn't exact nor near raw, I removed it for x86 at all.
But other CPUs (good olde Z80 and followers) can work as RTCL :)

__
wolfgang

hi Wolfgang,

nice to see you again :)

Those lists are a bit confusing.. I think i´ll do as you did. Just
using the AMD list with latencies. I´m trying to make a list
containing the clock cycles of each mnemonic, but there are so many
different processors, ust helps to increase the confusion.

The best list i found so far was herehttp://www.logix.cz/michal/doc/i386/chp17-00.htm

Sure.. it is old.. it is for 386, but it displays the clock cycles on
a easy to read way.

The list robert provided, also refers to the general purpose
mnemonics.. like:

CMP/TEST latency: 1, Throughput = 0.5

So, i presume that the way they behave is the same as for SSE
instructions right ?

I mean, they works more or less like the formula i posted before,
right ?

But.. if that is true...then why on this documents says that JCC don´t
have latency ?

It is said on Table C10 that for a processor 0F2, the Jcc is not
applicable, but it have a Throughput of 0.5...But.. how is that
possible ?

if a instructino don´t use have the latency to compute the clock
cycles used to it be issued.. how it works ? I mean, it _could_ works
only from the Throughput, but.. if the latency is 0, shouldn´t the
Throughput be also 0 ? I thought the Throughput and latency were
related to each other.

Best Regards,

Guga


Someone knows where to get a list of CPUIDs signatures of all
processors ?

For example:
Pentium M - Banias is 0x69X
Pentium M - Dothan is 0x6DX



.



Relevant Pages

  • Re: Clock cycles list - help
    ... N = Amount of instructions used, ... Intel got similar lists, but I also missed SSE-timing there. ... but it used latency values only. ... containing the clock cycles of each mnemonic, ...
    (alt.lang.asm)
  • Re: Clock cycles list - help
    ... N = Amount of instructions used, ... but it used latency values only. ... containing the clock cycles of each mnemonic, ... Microarchitecture ...
    (alt.lang.asm)
  • Re: Iyonix instruction timings and RAM speed results
    ... I get 166/127 for main RAM and 5.7/62 for PCI Video memory on my original ... unrolled LDM/STM instructions for these, ... "add floating point" instruction takes almost 200 clock cycles, ... MOV R0,R0,LSL #1 ...
    (comp.sys.acorn.misc)
  • Re: Atmega128 or MSP430 for low power, decent performance?
    ... MSP430) but otherwise 2 clock cycles is much more realistic ... claims like "most instructions will take 1 clock cycle". ... using the Dhrystone benchmark, but this assumes the compiler has ... the AVR may be faster. ...
    (comp.arch.embedded)
  • Re: 8 point FFT core : Hardware Implementation
    ... It's standard 3 stage pipelined architecture, ... which means a latency of 3 to 4 cycles. ... It is enough logic to compute eight FFT coefficients given ... every some number of clock cycles. ...
    (comp.dsp)