Re: non load/store architecture?
- From: "Brandon J. Van Every" <SeaFuncSpam@xxxxxxxxx>
- Date: 14 Dec 2006 20:33:19 -0800
David Brown wrote:
But in a lot of code it does matter.
Hold onto that thought, that it matters in "a lot" of code.
Take the
simple C code "x = 123456;", where "x" is a 32-bit global variable. On
the ColdFire, this compiles to:
move.l #123456,%d0
move.l %d0,x
Two instructions, each 6 bytes long, each executing in 1 clock (plus a
write access to memory).
On the PPC, this compiles to:
lis %r0,0x1
ori %r0,%r0,57920
lis %r9,x@ha
stw %r0,x@l(%r9)
(See what I mean about ColdFire code being nice and clear?)
Sure, but I got used to the load low, load high drill on the Alpha
easily enough.
That's four instructions, each 4 bytes, each executing in 1 clock (plus
a write access to memory).
The ColdFire generates more compact code, running at twice the speed for
the same clock. That's what I mean by greater work done per clock.
So it's faster at loading an immediate 32-bit constant. Big deal!
That is not an important job. The RISC tradeoff is legitimate here.
You don't really need to load immediate constants very much, so it is
acceptable to do an "instruction dance," that doesn't typically run
slower anyways, because the pipelining masks the latency. So you get
to keep all your instructions 4 bytes long, which simplifies your
decoder, and also your code cache alignment.
The Alpha had instructions for loading immediate constants that were
more likely to matter. 16-bit constants could be done in 1
instruction, and 3 or 4 bit constants were typically part the
instruction itself.
Now I suppose if you design CPUs with almost no cache, you might care
about instructions being small. But then, you're not designing a
performance CPU anyways. So who's gonna care about the performance?
"Good" won't mean optimization, it'll mean low power or cheap to
manufacture or something.
We are clearly coming from this from different experiences, if OpenGL
drivers on an Alpha are typical for your programming, while I work
mostly with smaller processors (the ColdFire I am using at the moment
has no cache - all its flash and sram are internal, with single cycle
access). But performance is very important to small systems - high
performance means you can use slower clock speeds, leading to lower
power, lower EMI, and cheaper components. It might not be the most
important factor, but it is still there.
Yep, very different. Part of why I started posting here, to figure out
what's different about "the kind of ASM I know" vs. "the kind of ASM
embedded engineers typically do."
Even on cached processors, small code means better use of the cache.
Critical loops will (should!) fit within even a small instruction cache,
but programs consist of more than their critical loops. A complete
instruction cache miss might mean a stall of a hundred or more clock
cycles (which might be worth twice that in instruction counts on a
superscaler processor) - there is a reason why more expensive processors
have larger instruction caches. More compact code gives the same
benefits of a larger cache.
Not unless you can *really* compact the code.
Cheers,
Brandon Van Every
.
- Follow-Ups:
- Re: non load/store architecture?
- From: David Brown
- Re: non load/store architecture?
- References:
- non load/store architecture?
- From: drizzle
- Re: non load/store architecture?
- From: Andrew Smallshaw
- Re: non load/store architecture?
- From: Brandon J. Van Every
- Re: non load/store architecture?
- From: David Brown
- Re: non load/store architecture?
- From: Brandon J. Van Every
- Re: non load/store architecture?
- From: David Brown
- Re: non load/store architecture?
- From: Brandon J. Van Every
- Re: non load/store architecture?
- From: David Brown
- non load/store architecture?
- Prev by Date: Re: ADS1.2 inline assembly :Branching by writing PC is not supported
- Next by Date: Re: One of the 200 interrupts does not get executed sometimes due to a single statement
- Previous by thread: Re: non load/store architecture?
- Next by thread: Re: non load/store architecture?
- Index(es):
Relevant Pages
|