Re: Handling high UDP throughput
- From: Vladimir Vassilevsky <antispam_bogus@xxxxxxxxxxx>
- Date: Sun, 08 Mar 2009 10:48:42 -0500
Bill A. wrote:
Vladimir Vassilevsky wrote:
Bill A. wrote:
Vladimir Vassilevsky wrote:
The IP/UDP stack at 40MB/s is the substantial computing load. You
need a ~GHz class CPU with the appropriate memory and DMA
subsystems.
You can do 60MB/s easily with TCP to a 500MHz PowerPC even using a
WinXP PC as the host.
Only if you just send the same packet over and over in a dummy loop
and do nothing else.
Actually, my tests sending data and doing nothing with the data got me over 920MbS. You just can't throw together a system and do this. You won't get that with Linux or any other OS. The product that uses this sustains 540MbS with a 38kHz interrupt running using more than half the processor's power, so a lot goes on in the system but a lot of time is available for TCP/IP. The Ethernet driver was optimized, the memory movement was optimized (just using an inline memcpy that does a DMA transfer adds 30% to the effective speed), the IP checksum was in assembly, and a zero-copy TCP/IP stack was required.
This was with the Freescale QUICC 8349 so I concur with the other post - this processor can do it - it's designed as a communications processor.
A lot depends on what OS and TCP/IP stack are used no
the device, what is done with the data once received, and how much
time you can put into optimizing the system.
I'm not just saying you can do this because I think you can - I've
done it.
I've done 100Mbit Tx/Rx with BlackFin at 600MHz. Even the 12/12 MB UDP
traffic is the considerate amount of load. Copying between the
different buffers, calculation of the checksums, cache trashing etc.
etc. = all of that is not free and hogs the bus and CPU.
I didn't say it was easy. I didn't say a system like you used could do it. I'm only saying it is possible in an embedded device with a reasonable processor - you don't need ~GHz as you claimed.
What OS did you use? What stack? How much TX buffers did you have?
Our own OS, our own stack and MAC driver, 4/4 Rx/Tx buffers, 100/100 full duplex. It was found that there is generally no advantage in using more then 4 buffers; less then 4 buffers decreases the throughput.
How fast could the processor get the data to the MAC?
That is done by DMA. The speed depends on many factors.
Did you do zero-copy TCP/IP (it's very hard to do this with sockets)?
No, it has to copy the data. You have to do that not just because of sockets but since BlackFin doesn't have the automatic means to ensure cache - DMA coherency.
The QUICC buffer descriptor memory makes it very easy to send lots of data without processor intervention. Oh, I forgot, the Ethernet driver I wrote wasn't even interrupt driven.
So, the driver is blocking. No multitasking.
At those interrupt rates there was no improvement over simply polling for data. This may have been because when polling, the processor cache wasn't constantly being replaced by the Ethernet interrupt service routine.
The interrupt servicing up to the rates of ~hundreds kHz isn't a big problem in BlackFin. The context switch overhead is only ~200ns, and the interrupt code and data are located in L1, so there is no stalling because of the cache.
Vladimir Vassilevsky
DSP and Mixed Signal Design Consultant
http://www.abvolt.com
.
- References:
- Handling high UDP throughput
- From: eliben
- Re: Handling high UDP throughput
- From: Vladimir Vassilevsky
- Re: Handling high UDP throughput
- From: Bill A.
- Re: Handling high UDP throughput
- From: Vladimir Vassilevsky
- Re: Handling high UDP throughput
- From: Bill A.
- Handling high UDP throughput
- Prev by Date: porting uClinux to ARM processor and RT application suggestions
- Next by Date: Re: Which oscilloscope to go for?
- Previous by thread: Re: Handling high UDP throughput
- Next by thread: Re: Handling high UDP throughput
- Index(es):
Relevant Pages
|