Re: 2.6, 3.0, and truly independent intepreters
- From: Rhamphoryncus <rhamph@xxxxxxxxx>
- Date: Thu, 23 Oct 2008 14:24:50 -0700 (PDT)
On Oct 23, 11:30 am, Glenn Linderman <v+pyt...@xxxxxxxxxxxx> wrote:
On approximately 10/23/2008 12:24 AM, came the following characters from
the keyboard of Christian Heimes:
Andy wrote:
2) Barriers to "free threading". As Jesse describes, this is simply
just the GIL being in place, but of course it's there for a reason.
It's there because (1) doesn't hold and there was never any specs/
guidance put forward about what should and shouldn't be done in multi-
threaded apps (see my QuickTime API example). Perhaps if we could go
back in time, we would not put the GIL in place, strict guidelines
regarding multithreaded use would have been established, and PEP 3121
would have been mandatory for C modules. Then again--screw that, if I
could go back in time, I'd just go for the lottery tickets!! :^)
I've been following this discussion with interest, as it certainly seems
that multi-core/multi-CPU machines are the coming thing, and many
applications will need to figure out how to use them effectively.
I'm very - not absolute, but very - sure that Guido and the initial
designers of Python would have added the GIL anyway. The GIL makes
Python faster on single core machines and more stable on multi core
machines. Other language designers think the same way. Ruby recently
got a GIL. The article
http://www.infoq.com/news/2007/05/ruby-threading-futuresexplains the
rationales for a GIL in Ruby. The article also holds a quote from
Guido about threading in general.
Several people inside and outside the Python community think that
threads are dangerous and don't scale. The paper
http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdfsums
it up nicely, It explains why modern processors are going to cause
more and more trouble with the Java approach to threads, too.
Reading this PDF paper is extremely interesting (albeit somewhat
dependent on understanding abstract theories of computation; I have
enough math background to follow it, sort of, and most of the text can
be read even without fully understanding the theoretical abstractions).
I have already heard people talking about "Java applications are
buggy". I don't believe that general sequential programs written in
Java are any buggier than programs written in other languages... so I
had interpreted that to mean (based on some inquiry) that complex,
multi-threaded Java applications are buggy. And while I also don't
believe that complex, multi-threaded programs written in Java are any
buggier than complex, multi-threaded programs written in other
languages, it does seem to be true that Java is one of the currently
popular languages in which to write complex, multi-threaded programs,
because of its language support for threads and concurrency primitives.
These reports were from people that are not programmers, but are field
IT people, that have bought and/or support software and/or hardware with
drivers, that are written in Java, and seem to have non-ideal behavior,
(apparently only) curable by stopping/restarting the application or
driver, or sometimes requiring a reboot.
The paper explains many traps that lead to complex, multi-threaded
programs being buggy, and being hard to test. I have worked with
parallel machines, applications, and databases for 25 years, and can
appreciate the succinct expression of the problems explained within the
paper, and can, from experience, agree with its premises and
conclusions. Parallel applications only have been commercial successes
when the parallelism is tightly constrained to well-controlled patterns
that could be easily understood. Threads, especially in "cooperation"
with languages that use memory pointers, have the potential to get out
of control, in inexplicable ways.
Although the paper is correct in many ways, I find it fails to
distinguish the core of the problem from the chaff surrounding it, and
thus is used to justify poor language designs.
For example, the amount of interaction may be seen as a spectrum: at
one end is C or Java threads, with complicated memory models, and a
tendency to just barely control things using locks. At the other end
would be completely isolated processes with no form of IPC. The later
is considered the worst possible, while the latter is the best
possible (purely sequential).
However, the latter is too weak for many uses. At a minimum we'd like
some pipes to communicate. Helps, but it's still too weak. What if
you have a large amount of data to share, created at startup but
otherwise not modified? So we add some read only types and ways to
define your own read only types. A couple of those types need a
process associated with them, so we make sure process handles are
proper objects too.
What have we got now? It's more on the thread end of the spectrum
than the process end, but it's definitely not a C or Java thread, and
it's definitely not an OS process. What is it? Does it have the
problems in the paper? Only some? Which?
Another peeve I have is his characterization of the observer pattern.
The generalized form of the problem exists in both single-threaded
sequential programs, in the form of unexpected reentrancy, and message
passing, with infinite CPU usage or infinite number of pending
messages.
Perhaps threading makes it much worse; I've heard many anecdotes that
would support that. Or perhaps it's the lack of automatic deadlock
detection, giving a clear and diagnosable error for you to fix.
Certainly, the mystery and extremeness of a deadlock could explain how
much it scales people. Either way the paper says nothing.
Python *must* gain means of concurrent execution of CPU bound code
eventually to survive on the market. But it must get the right means
or we are going to suffer the consequences.
This statement, after reading the paper, seems somewhat in line with the
author's premise that language acceptability requires that a language be
self-contained/monolithic, and potentially sufficient to implement
itself. That seems to also be one of the reasons that Java is used
today for threaded applications. It does seem to be true, given current
hardware trends, that _some mechanism_ must be provided to obtain the
benefit of multiple cores/CPUs to a single application, and that Python
must either implement or interface to that mechanism to continue to be a
viable language for large scale application development.
Andy seems to want an implementation of independent Python processes
implemented as threads within a single address space, that can be
coordinated by an outer application. This actually corresponds to the
model promulgated in the paper as being most likely to succeed. Of
course, it maps nicely into a model using separate processes,
coordinated by an outer process, also. The differences seem to be:
1) Most applications are historically perceived as corresponding to
single processes. Language features for multi-processing are rare, and
such languages are not in common use.
2) A single address space can be convenient for the coordinating outer
application. It does seem simpler and more efficient to simply "copy"
data from one memory location to another, rather than send it in a
message, especially if the data are large. On the other hand,
coordination of memory access between multiple cores/CPUs effectively
causes memory copies from one cache to the other, and if memory is
accessed from multiple cores/CPUs regularly, the underlying hardware
implements additional synchronization and copying of data, potentially
each time the memory is accessed. Being forced to do message passing of
data between processes can actually be more efficient than access to
shared memory at times. I should note that in my 25 years of parallel
development, all the systems created used a message passing paradigm,
partly because the multiple CPUs often didn't share the same memory
chips, much less the same address space, and that a key feature of all
the successful systems of that nature was an efficient inter-CPU message
passing mechanism. I should also note that Herb Sutter has a recent
series of columns in Dr Dobbs regarding multi-core/multi-CPU parallelism
and a variety of implementation pitfalls, that I found to be very
interesting reading.
Try looking at it on another level: when your CPU wants to read from a
bit of memory controlled by another CPU it sends them a message
requesting they get it for us. They send back a message containing
that memory. They also note we have it, in case they want to modify
it later. We also note where we got it, in case we want to modify it
(and not wait for them to do modifications for us).
Message passing vs shared memory isn't really a yes/no question. It's
about ratios, usage patterns, and tradeoffs. *All* programs will
share data, but in what way? If it's just the code itself you can
move the cache validation into software and simplify the CPU, making
it faster. If the shared data is a lot more than that, and you use it
to coordinate accesses, then it'll be faster to have it in hardware.
It's quite possible they'll come up with something that seems quite
different, but in reality is the same sort of rearrangement. Add
hardware support for transactions, move the caching partly into
software, etc.
I have noted the multiprocessing module that is new to Python 2.6/3.0
being feverishly backported to Python 2.5, 2.4, etc... indicating that
people truly find the model/module useful... seems that this is one way,
in Python rather than outside of it, to implement the model Andy is
looking for, although I haven't delved into the details of that module
yet, myself. I suspect that a non-Python application could load one
embedded Python interpreter, and then indirectly use the multiprocessing
module to control other Python interpreters in other processors. I
don't know that multithreading primitives such as described in the paper
are available in the multiprocessing module, but perhaps they can be
implemented in some manner using the tools that are provided; in any
case, some interprocess communication primitives are provided via this
new Python module.
There could be opportunity to enhance Python with process creation and
process coordination operations, rather than have it depend on
easy-to-implement-incorrectly coordination patterns or
easy-to-use-improperly libraries/modules of multiprocessing primitives
(this is not a slam of the new multiprocessing module, which appears to
be filling a present need in rather conventional ways, but just to point
out that ideas promulgated by the paper, which I suspect 2 years later
are still research topics, may be a better abstraction than the
conventional mechanisms).
One thing Andy hasn't yet explained (or I missed) is why any of his
application is coded in a language other than Python. I can think of a
number of possibilities:
A) (Historical) It existed, then the desire for extensions was seen, and
Python was seen as a good extension language.
B) Python is inappropriate (performance?) for some of the algorithms
(but should they be coded instead as Python extensions, with the core
application being in Python?)
C) Unavailability of Python wrappers for particularly useful 3rd-party
libraries
D) Other?
"It already existed" is definitely the original reason, but now it
includes single-threaded performance and multi-threaded scalability.
Although the idea of "just write an extension that releases the GIL"
is a common suggestion, it needs to be fairly coarse to be effective,
and ensure little of the CPU time is left in python. If the apps
spreads around it's CPU time it is likely impossible to use python
effectively.
.
- References:
- 2.6, 3.0, and truly independent intepreters
- From: Andy
- Re: 2.6, 3.0, and truly independent intepreters
- From: Rhamphoryncus
- Re: 2.6, 3.0, and truly independent intepreters
- From: Andy
- 2.6, 3.0, and truly independent intepreters
- Prev by Date: Re: python extensions: including project local headers
- Next by Date: Re: Append a new value to dict
- Previous by thread: Re: 2.6, 3.0, and truly independent intepreters
- Next by thread: Re: 2.6, 3.0, and truly independent intepreters
- Index(es):
Relevant Pages
|