Re: Atmel releasing FLASH AVR32 ?



"Wilco Dijkstra" <Wilco_dot_Dijkstra@xxxxxxxxxxxx> skrev i meddelandet
news:e8eRh.2250$gr2.1244@xxxxxxxxxxxxxxxxxxxxxxx

"Ulf Samuelsson" <ulf@xxxxxxxxxxxxx> wrote in message
news:ev2i1h$qhh$1@xxxxxxxxxxx

Multithreading on a high end general purpose CPU gives problem on their
own.
Especially with cache trashing.

Absolutely. The "solution" is to add more cache...

No, the solution is to have more associativity in the cache.
Having 4GB of direct mapped cache will not help you when
two threads start using the same cache line.

With an embedded core where you use tightly coupled high bandwidth memory
for most of the threads you do not have that problem

Same solution: more fast on-chip memory.

If you want to solve the problem, general purpose for symmetric
multiprocessing
by putting the application memory on the chip, you are going to run into
significant
problems.
You are beginning to get out of touch with reality, my dear friend.

I think it is eminently useful for assymetric multiprocessing where
you have some dedicated tasks to do which are best implemented
in a separate CPU to avoid real time response conflicts and can
be implemented in a low end 32 bitter.

I'm not quite sure what you're saying here. Are you advocating
asymmetric multiprocessing or asymmetric multithreading?


I am saysing that it is cheaper to use asymmetric multithreading
than asymmetric multiprocessing..

I think you need to stop trying to explain why a single CPU
is better than a multiththreaded CPU, because noone is
using a single CPU for implementing two simulaneously
operating software MACs.

First of all, you're the one that claims one CPU is better than 2...
I believe 2 CPUs is better in many cases - multicore is the future.
However if you do move to a single (faster) CPU then it doesn't
make much difference in terms of realtime response whether that
CPU is multithreaded or not. You seem to believe that threads are
somehow much better than interrupts - but as I've shown they are
equivalent concepts.

In order for interrupts to be equivalent to multithreading,
where you can select a new executing an instruction from
an interrupt every new clock cycle, you have to add
additional constraint to your "interrupt" system.

You have to have multiple register files and multiple program counters in
the system.
You have to add additional hardware to dynamically raise/lower priorities
in order to distribute instructions among the different interrupts.
Your "interrupt" driven system is likely to be mistaken for a multithreading
system.

Your way of discussion is way off , you ignore ALL arguments
and requests to prove your point, in favour of continued rambling...

You need to show that the given example (Multiple SPI slaves)
can be handled equally well by an *existing* interrupt driven system
as well as how it can be handled by an *existing* multithreaded
system like the zero context switch cost MIPS processor,

I now put the flip on the shoulder, can you concentrate to that instead of
rambling?



If you continue, that just proves that you are either ignorant or not
listening

That kind of response is not helping your case. If you believe I'm wrong,
then why not prove me wrong with some hard facts and data?


I already did.
I showed that there exist zero context switch cost MIPS processor.
You have not shown that there exist zero cost interrupts.

If go back to the example.

You have a fixed clock.
This is used by a number of SPI masters to provide data to your chip.
Your chip implements SPI slaves and each SPI slave should run
in a separate task/thread or whatever.
The communication on each SPI slave channels is totally different
and should be developed by two teams which do not communicate
between each other and they are not aware of each other.
once per byte, the SPI data is written to memory and
an event flag register private to the thread/interrupt is written.

They are aware of the execution environment, which in the interrupt case
is the RTOS and how interrupts are handled

Using one multithreaded and one interrupt processor, with frequency scaled
so the top level
of MIPS is equivalent, show that you can implement the SPI slave.


The issues is replacing multiple CPUs/Memory Subsystems
with a single multithreaded CPU addressing a memory subsystem´
consisting of internal TCM memory, internal loosely coupled
memory (flash?) and external memory.

Most realtime CPUs have some form of fast internal memory,
this is not relevant to multithreading.

Eight cores and 16 threads (probably they mean per-core?) is impressive
for what sound like fairly mainstream cores.

It clearly says 2 threads per core. Any more would be a waste.


Look at Sun and UltraSparc T1, they certainly do not see the boundaries
that you see.

The T1 has tiny caches and stalls on a cachemiss unlike any other
high-end out-of-order CPU, so they require more threads to keep going
if one thread stalls. It is also designed for highly multithreaded
workloads,
so having more thread contexts means fewer context switches in software,
which can be a big win on workloads running on UNIX/Windows (realtime
OSes are far better at these things).

It is the other way around. *Because* you have many threads you CAN
stall a thread on a cache miss, without affecting the total throughput
of the CPU. It is very likely that the T1 shoves more instructions
per clock cycle than a "high end, branch prediction, out of order" single
or dual thread CPU.



I do not think that they are limited by Intels vision...
Also I pointed you at the new MIPS Multithreading core.
They certainly do not agree with You!

If you do not understand the differences between cores like Itanium-2,
Pentium-4, Nehalem, Power5, Power6 (all 2-way multithreaded),
and cores like the T1, MIPS34K and Ubicom (8+ -way threaded),
then you're not the expert on multithreading you claim to be.


You seems to want to slip into a discussion which type
of CPU will exhibit the highest MIPS rate for a single thread.
That is trying to force open an already open door.


Wilco




--
Best Regards,
Ulf Samuelsson
This is intended to be my personal opinion which may,
or may not be shared by my employer Atmel Nordic AB


.



Relevant Pages

  • Re: Atmel releasing FLASH AVR32 ?
    ... the solution is to have more associativity in the cache. ... if you can solve it with a multithreaded core ... additional constraint to your "interrupt" system. ... There is *nothing* that prevents a CPU ...
    (comp.arch.embedded)
  • Re: Atmel releasing FLASH AVR32 ?
    ... If not all ports are active then multithreading has ... Id like to see a single threaded CPU doing this in 160 instructions. ... I think an interrupt is probably 5-10 clocks and return from interrupt ... A classic example would be something implementing a V.22 modem in S/W. ...
    (comp.arch.embedded)
  • Re: What happened to computer architecture (and comp.arch?)
    ... The interrupt-coalescing code helps bring the interrupt rate ... On a single cpu it is quite possible to access a small part of multiple ... use of a fraction of available cache space. ... any read request should be pretty random, but the packet handling ...
    (comp.arch)
  • Re: Atmel releasing FLASH AVR32 ?
    ... the solution is to have more associativity in the cache. ... additional constraint to your "interrupt" system. ... Your "interrupt" driven system is likely to be mistaken for a multithreading system. ... At this point the CPU behaves exactly like an interrupt ...
    (comp.arch.embedded)
  • Re: Ein reines Gedankenexperiment mit Serverboards
    ... Also Bandbreite von Instruktionen im Cache war noch nie das Problem. ... aus dem Hauptspeicher lesen will erstmal eine Anfrage an die CPU die ... ein niedriger priorisierter und gerade anliegender Interrupt nicht mehr ...
    (de.comp.hardware.graphik)