Automatic parallelization - was Re: LISP Object Oriented?
- From: George Neuner <gneuner2/@comcast.net>
- Date: Tue, 30 Jan 2007 03:34:59 -0500
On 29 Jan 2007 09:06:34 -0800, "mark.hoemmen@xxxxxxxxx"
<mark.hoemmen@xxxxxxxxx> wrote:
On Jan 27, 10:42 pm, George Neuner <gneune...@xxxxxxxxxxx> wrote:
Well ... I don't want to start a war here but it has to be said that
CL's explicit sequencing and guaranteed order of evaluation for
function arguments is a detriment to extracting expression level
parallelism. Source level solutions such as threads, futures, PCALL
and so on are not enough to take advantage of massively parallel
hardware.
No flame wars here, but I am interested in a good discussion :)
I don't have enough experience on the kinds of applications where that
kind of parallelism is useful -- my parallel apps tend to be very
regular. But I'm wondering whether automatic extraction of
parallelism will be that useful in the long run, as communication will
become increasingly the dominant cost.
Keep in mind first that we are talking about shared memory machines
and second that multi-core processors are much more closely coupled
than the parallel CPUs of previous generation machines. Multi-cores
typically share 2nd level cache and have dedicated buses between the
cores for snooping 1st level caches.
WRT automatic extraction of parallelism, studies have shown that the
vast majority of programmers have trouble with parallel programming
using source level methods. The sad fact is that most people find it
difficult to think about simultaneous tasks, are lousy at anticipating
multiple data access problems, and are equally bad at diagnosing and
fixing them. Experience with threading doesn't seem to help those who
don't get it in the first place.
[This I can attest to first hand. I worked for over a decade writing
threaded, real time, 24/7 apps on various platforms from bare metal to
RTOS to (honest to God!) Linux and Wintel boxes. The company I worked
for had enormous trouble finding and keeping good developers. It's a
really big deal when a bug in your app takes down a factory on the
overnight shift.]
We've already got quad-core general purpose CPUs and larger n-ways on
the horizon. Obviously some of the power will go to running multiple
programs simultaneously (or in the case of Vista, just running the
operating system). But the only reasonable way to speed up any single
program is to spread the computations over multiple cores. The
question is how best to do that. I can tell you first hand that
source scheduling units on a modern DSP is tough enough - the coming
multi-core CPUs will do a lot more and effectively source scheduling
them will quickly become impossible.
Threading at the source level doesn't seem likely to go far. Most
tasks are inherently serial at the source level - finding more than a
couple of things to do simultaneously is hard, manually splitting work
for a dozen cores will be incredibly hard. Similarly, PCALL loses its
attraction when you consider that to be effective there can be little
or no interaction between the parallel computations. Additionally,
both threads and PCALL are sequenced explicitly by the programmer.
Futures are more interesting. Most people think of futures in
connection with lazy programming and demand computation, but a future
could be aggressively evaluated in parallel prior to need whenever the
future's preconditions are met - so in effect a future could be
thought of as an implicitly defined, demand scheduled thread.
However, source level futures as in Lisp/Scheme are too course grained
and evaluation is under the control of the programmer. Because they
are thought of mostly for lazy evaluation where the preconditions are
expected to be in place at the time of evaluation, futures are not
typically written in a way that is conducive to parallel evaluation
(though the compiler could supplement the future with the necessary
precondition tests).
A lazy language may create a future implicitly whenever a computation
is non-strict in its parameters. These implicit futures theoretically
may be as fine grained as primitive operations (they aren't in current
FPLs). However, the current crop of lazy languages implement a strict
serial evaluation model - futures are computed on demand only when
their values are needed.
In scientific computing we've been using "source level solutions" for
quite some time now -- if you count source-to-source translators as
"source level solutions" ;P That solution scales to thousands of
processors. Scaling is partially a system question but also partially
an algorithm question; programmers have to be trained to think "in
parallel."
Yes, but scientific programming is a small percentage of all
programming and big regular, easily decoupled problems are a minority
even there. Problems which involve significant communication are
already difficult to program effectively for large scale, separate
memory machines.
Potential parallelism in a function call is only worth extracting if
the data are sufficiently large and already distributed across the
processors. Otherwise the communication costs of scattering and
gathering the data won't justify parallelization.
The cost of moving computations in a tightly coupled, shared memory
system is quite low. I don't have a link handy at the moment, but I
read a study last year indicating that striping loop iterations (as
opposed to unrolling them on a single processor) can already be
profitable on some multi-core systems.
I can envision a distributed solution based on aggressive evaluation
of futures - with a runtime similar to parallel OPS5 but starting from
any language with call-by-need semantics. My solution would work only
for a small number of cores (maybe 8..16), but it would scale a
program automatically within its limits.
George
--
for email reply remove "/" from address
.
- Follow-Ups:
- Re: Automatic parallelization - was Re: LISP Object Oriented?
- From: Marcus Breiing
- Re: Automatic parallelization - was Re: LISP Object Oriented?
- From: Tim Bradshaw
- Re: Automatic parallelization - was Re: LISP Object Oriented?
- Prev by Date: Re: (case 'quote ('lambda 1) (otherwise 2))
- Next by Date: Re: (case 'quote ('lambda 1) (otherwise 2))
- Previous by thread: Replacing Xaml with Lisp on Windows Vista
- Next by thread: Re: Automatic parallelization - was Re: LISP Object Oriented?
- Index(es):