Re: Multiprocessor Core




Randy Hyde wrote:
>
> I'm not sure what assembly has to do with it,

I think it has, because there might appear some new instructions in the
instruction set.

> but you do realize that
> the "central microprocessor" will become the bottleneck, throttling
> down the performance of the entire system, right?

Yes, but the task of assigning the jobs for other microprocessors
shouldn't be that heavy. NASA or some other fellows even used a public
network of computers that did some really heavy computation that
previously was possible for super-computers only.

> Note that once you get above four processor, you reach the point of
> diminishing returns in standard shared memory multiprocessor systems.

The solution would be to give each microprocessor a separate memory
block (chip), just like OS does. Here we may finally reach that level
when we will not require any operating system (to manage memory). OS
will be used just to provide some GUI and runtime libraries. It's still
unclear how to share video memory, however. Maybe to have several
micro-graphics cards too (I belive that's already possible).

> You need special busses to get decent performance above that point.
> Also, if you're trying to improve the performance of a single
> application via multithreading, it's real hard to get good performance
> gains except for specialized algorithms once you get above 4-16
> threads.

Dividing some easy and simple task between 4 or more CPUs is really a
hard job. Can there be really any performance gains if Intel dual core
Xeon supports up to 4 simultaneous software threads only? (assuming
that we have unending data supply at maximum speed).

.



Relevant Pages

  • Re: Superstitious learning in Computer Architecture
    ... Without a LOT of logic or some other better approach, re-executing the instructions requires re-decoding and it ties up the cache memory bus transferring more data as instructions than the instructions are working on. ... There is most of an order of magnitude in speed sacrificed by even HAVING a cache in a single ALU system, and more than an order of magnitude in multiple-ALU systems! ...
    (comp.arch.arithmetic)
  • Re: Iyonix instruction timings and RAM speed results
    ... I get 166/127 for main RAM and 5.7/62 for PCI Video memory on my original ... unrolled LDM/STM instructions for these, ... "add floating point" instruction takes almost 200 clock cycles, ... MOV R0,R0,LSL #1 ...
    (comp.sys.acorn.misc)
  • PART 3. Why it seems difficult to make an OOO VAX competitive (really long)
    ... implementations, compared to high-performance RISCs, but also to IA-32. ... "The VAX is so tied to microcode we predict it will be impossible to ... most frequently-used instructions can be converted to a small number ... given the increasing relative latency to memory. ...
    (comp.arch)
  • Re: Cost of calling a standard library function
    ... > sense, since push Allocates memory, and pop deallocates it. ... Hence, all the CPU does is, basically: ... so forth...it's even possible to get "free" instructions (effectively ... what else is an ASM coder's job? ...
    (alt.lang.asm)
  • Re: [SLE] For or against ..Hyperthreading.
    ... > Let's say for mail or Database server... ... Opteron and Athlon-64 also have a different memory access archetecture so ... - 64-bit Address registers means more memory is directly addressable, ... do a wider range of operations as primitive instructions (for example, fetch, ...
    (SuSE)