Re: singe thread per connection



On Sat, 12 Jul 2008 10:25:52 -0700, Neil Coffey <neil.coffey@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:

Peter Duniho wrote:

On 32-bit Windows, the theoretical maximum number of threads per process is about 2000, with the practical maximum somewhat lower, and performance suffering significantly before that. Unix/Linux would be different...threads are much lighter-weight constructs on those OSs.

Hmm that's interesting -- I wasn't aware of this 2000 limit. FWIW, in a
quick test I can start up about 5,600 threads in XP before I get a
"Cannot start new native thread" error, but I'd concede that's more or
less a limit within the order you say.

Is that in Java? The limit comes from the size of the stack allocated for a thread (1MB by default) and the maximum virtual address space for a process (2GB). If you use a different stack size than the default, or don't actually allocate one OS thread per Java thread, then the actual limit would be different.

Other than that it's a hard limit, and in practice you can't even reach that maximum because the process's virtual address space will include other things that prevent the entire space from being used for thread stacks.

Although it's a bit bonkers
to want so many threads, it's also kind of disappointing that you
absolutelty can't. I guess Windows uses a bit more memory than you'd
think for each thread control/environment block.

As I mention above, the main limit is the stack for each thread. They are also relatively "heavyweight" as compared to Unix threads for other reasons, but the main limit with respect to the maximum number of threads is simply address space.

I'm also introgued by another thing: if I understand you correctly,
you're saying that there's a performance difference between having
X threads all from one process and having X threads distributed
across various processes.

No, that's not what I am saying. But the large number of threads you might see in the overall OS is not nearly the problem that an even larger number would be in a single process.

For one, if there are even 200 threads over all the basic processes, then you can see that 2000 threads would increase the load by 10x. In addition, the vast majority of threads in the basic processes aren't actually doing much; some may go minutes if not hours without becoming runnable. On the other hand, an application that is actively servicing 500 tasks (such as telnet connections) will be constantly switching between all those threads.

Even if each thread uses the entire quantum granted it, that could be an issue, and the fact is for an i/o-bound application, it's very common for each thread to use only a portion of its quantum. Context-switching can become a very significant component of the overall CPU cost.

Basically, it's not just the number of threads that's an issue. It's what those threads are doing. The basic set of threads one sees in a just-booting OS behaves quite a bit differently than a similar or larger number of threads in a single process might.

I'm less familiar with other OS's, and even what's available in Java. But in Windows, both in the regular Win32 API and under .NET, there are i/o mechanisms that can be used that allow a single thread to service an arbitrarily large number of i/o tasks (e.g. telnet connections). This allows a program to create just enough threads to keep all the CPU cores busy, and the Windows scheduler knows to treat those threads specially so that if the only other runnable thread is one that would do the same thing that the currently running thread would do, the currently running thread is allowed to just keep running, rather than being preempted for no good reason.

It's possible that Java's NIO classes use this mechanism on Windows (known as "I/O Completion Ports"). I don't really know. But any technique that minimizes the need for context switches between a large number of threads that do relatively little work during each of their quanta has the potential for improving performance beyond the naïve "one thread per connection" approach. You mention that the NIO package does improve performance, so presumably "under the hood" it's doing _something_ along these lines.

Pete
.



Relevant Pages

  • Re: Iczelions tutorials revisited.
    ... By "local" variables on the stack I assume something like this? ... access parameters and locals that way. ... The Windows API uses "stdcall" in which "callee cleans up stack" - the Windows functions end with "ret N". ... Being an old dos-head, I'm used to using cx as a "counter", and it annoys me that calling libc or the Windows API is allowed to trash it, but that's life... ...
    (alt.lang.asm)
  • Re: Is MASM32 an evil Microsoft plot?
    ... Now your next blunder is to call the default windows message handler ... > you could use most any assembler and the whole advocacy for MASM disappears. ... C3;; retn ... Is there supposed to be some profundity at addressing the stack ...
    (alt.lang.asm)
  • Re: IPAQs and Bluetooth and Visual Studio 2005 beta 2
    ... Differences between WinCE and Windows Mobile: ... Note that if you don't have a device running the Microsoft Bluetooth stack, ...
    (microsoft.public.pocketpc.developer.networking)
  • Re: Need some help understanding array definitions
    ... their Windows product, ... Windows and DLL calls, and I'm sure MPE's equivalents address the same ... space available on the return stack at any given time. ... We don't have ALLOCATE or local buffers, ...
    (comp.lang.forth)