Re: CISC vs RISC concepts -- from an assembly view



In article <1145933365.997862.49780
@e56g2000cwe.googlegroups.com>, spamtrap@xxxxxxxxxx
says...
Now this question just popped out of my head. I'm viewing this as a C
program to an x86 assembly output.

CISC CPU speed is relied on microprocessor optimizations (like 3DNow!
and MMX, SSE2, SSE3 extensions/optimizations) which are enabled by the
compiler.

And RISC CPUs rely their speed on _compiler_ optimizations (converting
the least amount of operations done with what the user is trying
accomplish.

So if this is true, shouldn't everyone be using RISC processors and
just use really smart compilers to create their executable.

No -- rather the opposite. Optimizing the hardware is
more effective than optimizing the software. Worse, RISC
optimizes all the wrong things. RISC optimizes decoding
of instructions, at the expense of almost everything
else. The "everything else" specifically includes program
size. Larger programs mean they require larger caches,
and/or greater memory bandwidth.

If you look, you'll notice that on a modern CPU, the
instruction decoders are typically less than 1% of the
CPU area, while the caches (for example) are often more
than 50% of the CPU area. Even if you could completely
eliminate that 1% of the CPU area, almost any increase in
the cache size means that overall you get a bigger, more
expensive CPU.

In fairness, the instruction decoders are complex,
expensive logic to design, while a larger cache is
(mostly) fairly cheap to design. RISC CPUs are cheaper
and easier to design, but more expensive to manufacture.
For low enough volume applications, that can make sense.
When the volume is higher, the CISC will end up a much
better deal because that expensive design work gets
amortized across a much larger market.

Because
that way the RISC can go extremely fast (1:1 ratio of CPU Cycles :
Operations -- or at least close to this), and all of the advanced math
stuff (that Intel uses -- SSE2, SSE3) would be embedded within the
executable output.

Virtually every modern CPU can actually execute more than
one instruction per clock cycle. The CPU looks at the
instructions and based on their resource usage, figures
out which don't depend on prior instructions, so they can
be executed in parallel with those prior instructions.
Most also support out of order execution, so a later non-
dependent instruction can execute while an earlier
instruction that depends (for example) on a load from
memory waits for its data.

Unfortunately, RISC instruction streams don't generally
display a lot more parallelism than CISC instruction
streams. In either case, you generally have to provide a
LOT more CPU resources to get even a little more
parallelism in actual practice. IOW, the simplicity of
the instructions doesn't really contribute much to
executing them much faster.

Worse still, remember what was pointed out previously --
that RISC instruction streams tend to be longer than CISC
instruction streams. Not only do instructions tend to be
larger, but being simpler means you often need more of
them to get the same amount of real work done. That means
just to maintain parity, they _need_ to execute faster
(either higher clock speed or more in parallel) just to
maintain parity with a CISC.

A RISC CPU that attempts to compete with a CISC CPU often
has four times as big of a die -- and the cost of a die
is typically related to at least the square of its area,
so four times as big means at least 16 times as
expensive.

RISC and CISC CPUs are both trending toward more cores on
a die, but for entirely different reasons. On the RISC
side, the reason is entirely speed -- since their cores
are individually slower, but they're mostly (almost by
definition) sold into markets where speed matters a lot
more than cost, they're doing almost anything they can
just to keep up.

On the CISC side, the reason is entirely different. With
shrinks in process technology, CISC CPUs are in danger of
becoming so small that they'd become nearly free. If
you're Intel or AMD, the prospect of CPU prices falling
from a few hundred dollars to a few dollars sounds like a
pretty bad idea, especially when market doesn't seem to
be growing to compensate. Adding more cores is an obvious
way of improving performance (at least a little) while
keeping the average selling price in a range they like.

Another obvious move is to simply integrate more of the
computer onto the CPU die. This isn't very attractive to
Intel though, since moving more onto the CPU removes more
from the associated chipset -- which, in most cases, is
an Intel product as well. Since AMD does far less in the
chipset market, they've been much more aggressive about
moving chipset functionality into the CPU.

--
Later,
Jerry.

The universe is a figment of its own imagination.

.



Relevant Pages

  • Re: Porting VMS back to VAX ?
    ... >> I was never convinced that RISC was better than CISC until I experienced ... > What is the difference between CISC and RISC? ... complicated (the VAX polynomial instruction comes to mind but they had other ... sequence of microcode instructions internal to the CPU. ...
    (comp.os.vms)
  • Re: [PATCH] x86 - Enhance DEBUG_RODATA support - alternatives
    ... has been pulled out of the x86 tree. ... text_poke required to support this. ... correctly and so the CPU HOTPLUG special case can be removed. ... When you use this code to patch more than one byte of an instruction ...
    (Linux-Kernel)
  • Re: Simple function arguments
    ... are 2 names refering to the same memory location and use that. ... In the internals of a CPU there are various registers. ... address is stored from where the next instruction from memory is read and executed. ... what is generally referred to as 'The stack'. ...
    (comp.lang.cpp)
  • Re: wikipedia article
    ... parallel but skewed by one instruction. ... If the first CPU instruction execution causes a miss, ... memory access. ... distinguish between instruction and data references, ...
    (freebsd-questions)
  • Re: [PATCH] x86 - Enhance DEBUG_RODATA support - alternatives
    ... has been pulled out of the x86 tree. ... text_poke required to support this. ... correctly and so the CPU HOTPLUG special case can be removed. ... When you use this code to patch more than one byte of an instruction ...
    (Linux-Kernel)