Re: Multiprocessor Core




emu8086@xxxxxxxxxxxxxx wrote:
> Randy Hyde wrote:
> >
> > I'm not sure what assembly has to do with it,
>
> I think it has, because there might appear some new instructions in the
> instruction set.

Actually, I was referring to my comment...

>
> > but you do realize that
> > the "central microprocessor" will become the bottleneck, throttling
> > down the performance of the entire system, right?
>
> Yes, but the task of assigning the jobs for other microprocessors
> shouldn't be that heavy. NASA or some other fellows even used a public
> network of computers that did some really heavy computation that
> previously was possible for super-computers only.

Sure, *some* algorithms can be converted to parallel computation easy
enough. But don't think this is true for arbitrary algorithms.

>
> > Note that once you get above four processor, you reach the point of
> > diminishing returns in standard shared memory multiprocessor systems.
>
> The solution would be to give each microprocessor a separate memory
> block (chip), just like OS does.

But that pretty much forces a process to run on one CPU only and makes
load balancing an expensive task. Ideally, processors can execute
programs from any memory location, so a CPU can be assigned to an
arbitrary execution thread. If the process is sitting in memory that a
CPU cannot access (e.g., the block of memory associated with another
CPU) you either have to spend time moving data from that block to your
CPU's block, or you have to leave the process frozen while the CPU is
busy, even if other CPUs are available.


> Here we may finally reach that level
> when we will not require any operating system (to manage memory). OS
> will be used just to provide some GUI and runtime libraries. It's still
> unclear how to share video memory, however. Maybe to have several
> micro-graphics cards too (I belive that's already possible).

I have one work for you: NUMA (which is actually *three* words:
non-uniform memory access). There has been a lot of research into this
area. But most of the research is in the area of OSes and how OSes can
coordinate the execution of processes when executing out of one memory
location is more expensive than another.

>
> > You need special busses to get decent performance above that point.
> > Also, if you're trying to improve the performance of a single
> > application via multithreading, it's real hard to get good performance
> > gains except for specialized algorithms once you get above 4-16
> > threads.
>
> Dividing some easy and simple task between 4 or more CPUs is really a
> hard job. Can there be really any performance gains if Intel dual core
> Xeon supports up to 4 simultaneous software threads only? (assuming
> that we have unending data supply at maximum speed).

Depends entirely on the algorithm. Today, the main issue isn't so much
making one program run four times faster, but making four separate
programs (or threads) run four times faster (than they would on a
single CPU). That's a much easier task. Then, it's up to the app
developers to make their apps multithreaded (the hard work) if they
want the apps themselves to run faster. That, I'm afraid, isn't
something we're going to be able to do exclusively in hardware.
Cheers,
Randy Hyde

.



Relevant Pages

  • Next July 27: boot failure(hang) on x86_64 box.
    ... Freeing unused kernel memory: 1360k freed ... ACPI: PM-Timer IO Port: 0x488 ... CPU: L2 Cache: 1024K ... # AX.25 network device drivers ...
    (Linux-Kernel)
  • Re: Adjusting PC Hyperthreading for Spice Simulation
    ... Memory access taking hundreds of cycles? ... ago), 350 CPU cycles for a code cache miss was not atypical, but RAM ... % of execution) need all of them, mostly things like pusha and popa ...
    (sci.electronics.design)
  • Re: Adjusting PC Hyperthreading for Spice Simulation
    ... Memory access taking hundreds of cycles? ... ago), 350 CPU cycles for a code cache miss was not atypical, but RAM ... % of execution) need all of them, mostly things like pusha and popa ...
    (sci.electronics.design)
  • Re: Adjusting PC Hyperthreading for Spice Simulation
    ... Memory access taking hundreds of cycles? ... ago), 350 CPU cycles for a code cache miss was not atypical, but RAM ... % of execution) need all of them, mostly things like pusha and popa ...
    (sci.electronics.design)
  • Re: Adjusting PC Hyperthreading for Spice Simulation
    ... Memory access taking hundreds of cycles? ... ago), 350 CPU cycles for a code cache miss was not atypical, but RAM ... % of execution) need all of them, mostly things like pusha and popa ...
    (sci.electronics.design)