Re: non load/store architecture?




David Brown wrote:

Where does that leave us? A pure RISC architecture is best when aiming
for maximal ipc, and can be run at higher clock speeds as each step is
simpler. But the ColdFire code is more compact, leading to lower
bandwidth requirements on the instruction bus, and it does more per
instruction, giving better performance for the same ipc.

Working on OpenGL device driver optimization for the DEC Alpha, I never
saw instruction cache misses. Only data cache. Performance code is in
small loops, not huge hulking one-shots. I say "more compact code
improves performance" is theory, and not observable in practice. More
compact data, on the other hand, matters a great deal.

Being low power is a good indicator of an efficient ISA - the x86 is not
low power, and neither are most fast RISC cores as they need high clock
speeds. Remember, what's important for a processor is the work done per
clock, not just instructions per clock, and in comparison to a pure RISC
architecture, the ColdFire sacrifices a little ipc for a lot more work
done per instruction.

The units of work I've always cared about are FPU adds, multiplies, and
divides. There isn't more arithmetic work to do per instruction. You
could do more load/store work, but assuming you hit your primary data
cache, that's not your bottleneck anyways. The arithmetic is. As I
said above, instruction cache bloat doesn't matter in tightly looping
code. Or, I'd wager, in loosely looping code either. Instruction
caches are pretty big compared to the looping code. If all your code
is one-shot then you've got completely different system caching issues,
nothing to do with the CPU.

Now I suppose if you design CPUs with almost no cache, you might care
about instructions being small. But then, you're not designing a
performance CPU anyways. So who's gonna care about the performance?
"Good" won't mean optimization, it'll mean low power or cheap to
manufacture or something.


Cheers,
Brandon Van Every

.



Relevant Pages

  • Re: non load/store architecture?
    ... But the ColdFire code is more compact, ... instruction, giving better performance for the same ipc. ... Only data cache. ... and neither are most fast RISC cores as they need high clock ...
    (comp.arch.embedded)
  • Re: non load/store architecture?
    ... (See what I mean about ColdFire code being nice and clear?) ... the same clock. ... acceptable to do an "instruction dance," that doesn't typically run ... and also your code cache alignment. ...
    (comp.arch.embedded)
  • Re: non load/store architecture?
    ... (See what I mean about ColdFire code being nice and clear?) ... the same clock. ... and also your code cache alignment. ... Pipelining increases latency, but lets the instructions overlap, which may or may not hide the extra delay, depending on when the loaded constant is needed in the following instruction stream, and how superscaler the cpu is. ...
    (comp.arch.embedded)
  • Re: Superstitious learning in Computer Architecture
    ... don't really eat up that much memory bandwidth. ... That's what instruction caches and Harvard architecture is for. ... about is a loop with a 100% hit in the instruction cache, ... There's also a processor+DRAM chip (Mitsubishi DN10000 series, ...
    (comp.arch.arithmetic)
  • Re: Itanium Solutions Alliance
    ... > No, Rob: as usual, you're hyping them far beyond what they're likely to ... was done basically to eliminate instruction stream competition for the ... bandwidth and capacity of the L2 data cache. ... By splitting the L2 caches in Montecito a lot of good things happen. ...
    (comp.os.vms)