Re: non load/store architecture?
- From: David Brown <david@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Thu, 14 Dec 2006 09:58:49 +0100
Brandon J. Van Every wrote:
David Brown wrote:Brandon J. Van Every wrote:CISC, i.e. having arithmetic instructions that can access memory, i.e.x86 is not a bad architecture because of being a non-load/store
the x86 architecture we love to hate, is hard to optimize. Your
statement, "simplifies register assignment for (the compiler)" only
applies if you have no intention of producing optimal code.
One of the liabilities of RISC, i.e. load/store, is that programmers
don't tend to write code with any kind of load/store discipline. They
write whatever the hell they feel like writing. If it's the garden
variety industrial "*** code," it will contain sequences of memory
accesses that cannot be changed. At least, that the compiler can't
prove are safe to change. Programmers can write great code on a RISC
architecture if they're RISC-aware. LOAD everything up, do the math,
STORE the results. But generally they aren't RISC-aware, and don't
want to be.
Especially in a PC marketplace dominated by the x86. At DEC, we ran
into this issue all the time when porting device driver code to the DEC
Alpha. It's also why the Alpha's x86 emulation was never all that
great. x86 code, in general, simply couldn't be re-scheduled in RISC
load/store fashion. The Alpha may have been the world's best CPU in
its day, but it didn't have a commensurately kick*** memory
controller, so we suffered.
x86 is still here and the Alpha's dead. This is a clear case of "Worse
Is Better." That is, you spread your marketshare by being 1st with the
quck-n-dirty version. Also, if your inelegant solution runs on average
machines, you spread faster. Those who hold out for "Better Is Better"
tend to marginalize their marketshare. Look at what DEC was: great
engineering company, couldn't market its way out of a paper bag. Look
at the Wintel hegemony: still with us today, still has the same
business model of "Worse Is Better."
Cheers,
Brandon Van Every
architecture, or because it is CISC. There is nothing inherently bad
about being CISC, and nothing inherently good about RISC.
I have the bias of a performance jock. I define "good" as
"optimizeable." Yes, RISC is inherently better than CISC for
optimization; that is the point of RISC. I also think it is easier to
write optimizing compilers for RISC than for VLIW, judging by the
industry's experience with the Itanium. But at least, VLIW attempts to
regiment instruction scheduling. That's what you have to do to get
performance. The CISC "whatever length, whatever latency" stuff does
not cut it.
"Good" for an ISA can mean many things. From a low-level programmer's viewpoint, a "good" ISA is easy to work with at the assembly level, and it's (relatively :-) easy to make an optimising C compiler. It should provide whatever OS support functions (MMU, traps, etc.) are appropriate for the size of the cpu and it's applications. From the hardware viewpoint, it should be possible to make implementations that are small, low-power, give high instructions-per-clock, low branch overhead, small code size, etc.
If pure performance is your only requirement, then that means good compiler support, high IPC, and low latencies. Since you brought up the Itanium (of which I only know a little), we can compare a few architectures ranging from pure CISC (x86), half-way (ColdFire - it's technically CISC, but has many RISC features), pure RISC (PPC), and VLIW (Itanium).
From the compiler writer's viewpoint, the sweet-spot is probably the ColdFire, possibly the PPC. Lots of registers and an orthogonal instruction set are important - both have these. The ColdFire wins out because lots of common sequences that require two or more RISC instructions can be handled as one (function preludes and cleanups are much simpler with the 68k, and extremely common "load data, use data" sequences are shorter and faster on the 68k, and don't require an extra register). The x86 is horrible for compilers, requiring all sorts of tricks to make good code (although modern versions are better). Instruction prefixes must be a nightmare. And VLIW requires incredible compiler fortune-telling abilities in order to get good instruction sequencing.
For the hardware implementation, a well-constructed CISC ISA is easy on a small system with consistent, fast memory (such as a microcontroller). Of course, the x86 is by no means well-constructed, with its prefix codes. And for faster processes, it involves all sorts of complex register renaming schemes to allow pipelined and superscaler execution. Instruction decoding logic is large and slow, and pipelines are often long, complex, and inconsistent across instruction types (leading to long delays on mispredicted branches). The lack of registers means much more memory IO, which causes stalls and requires complex scheduling. RISC is much easier in this way - the instruction coding is far simpler, and there is much greater similarity across instruction types. Because you have many more registers, there are fewer bottleneck registers, and thus much less need for register renaming and other tricks. The optional condition code updates of the PPC also help earlier branch prediction. The ColdFire lies somewhat in between - its decode logic is harder than a pure RISC cpu would need, but far simpler than an x86. You need some anonymous registers to deal with direct memory operands, but not many, as much of the code is RISC-style register-register. The VLIW cpu can be made with very high ipc, but only for ideal code - it can't reschedule instructions dynamically, and is thus only fast for processing large loops (assuming that no data outside the L1 cache is read or written during the loop).
Where does that leave us? A pure RISC architecture is best when aiming for maximal ipc, and can be run at higher clock speeds as each step is simpler. But the ColdFire code is more compact, leading to lower bandwidth requirements on the instruction bus, and it does more per instruction, giving better performance for the same ipc. VLIW is a failed concept in most cases (it can be useful in DSP's, and some scientific programs, but not for general use), and it is not even worth considering making a fast core that executes x86 instructions directly - modern cores translate the x86 code into an internal RISC code. (I believe the high-speed ColdFire cores do that too to some extend - most instructions are RISCy enough to implement directly, but some are broken into a few RISC codes in the decoder.)
They give
different scope for different sorts of implementation, but it is
possible to make a bad RISC ISA or a good CISC ISA (the 68k/ColdFire is
an example of a very nice CISC ISA).
Define "nice." Maybe you think it's nice because it's low power or
easy to program or something.
Being low power is a good indicator of an efficient ISA - the x86 is not low power, and neither are most fast RISC cores as they need high clock speeds. Remember, what's important for a processor is the work done per clock, not just instructions per clock, and in comparison to a pure RISC architecture, the ColdFire sacrifices a little ipc for a lot more work done per instruction.
But the most obvious "nice" feature of the ColdFire ISA can be seen by looking at assembly listings. Writing nearly optimal code for it is easy, and (equally important) it is easy to understand generated optimal code. There is no need for x86-style abuse of addressing modes to write good code, and there is no need to have a 600 page manual on-hand to interpret the nuances of PPC instruction codes. This makes it nice for the programmer, nice for the compiler writer, and nice for the hardware implementer.
Additionally, this being c.a.e., ColdFire is eminently suitable for embedded systems. It's support for and handling of interrupts and exceptions, in particular, is excellent (I know there are other RISC cpus, like the ARM, that are also good), and compact code is a clear benefit.
The x86 was widely held to be a
poor and limited design when the 8086 first came out, and modern x86
chips have a terrible architecture (but with some very nice
implementations) as a result of incremental steps keeping backwards
compatibility with such a bad starting point.
Yep, Worse Is Better.
Modern x86 implementations are like turds polished until they glow in the dark.
.
Cheers,
Brandon Van Every
- Follow-Ups:
- Re: non load/store architecture?
- From: Brandon J. Van Every
- Re: non load/store architecture?
- From: Arlet
- Re: non load/store architecture?
- References:
- non load/store architecture?
- From: drizzle
- Re: non load/store architecture?
- From: Andrew Smallshaw
- Re: non load/store architecture?
- From: Brandon J. Van Every
- Re: non load/store architecture?
- From: David Brown
- Re: non load/store architecture?
- From: Brandon J. Van Every
- non load/store architecture?
- Prev by Date: Re: goto across functions/isrs?
- Next by Date: Re: non load/store architecture?
- Previous by thread: Re: non load/store architecture?
- Next by thread: Re: non load/store architecture?
- Index(es):