Re: I need some guideance regarding parallel processing
- From: Gordon Sande <g.sande@xxxxxxxxxxxxxxxx>
- Date: Sat, 03 Feb 2007 15:09:21 GMT
On 2007-02-03 00:37:36 -0400, "tripp" <tripplowe@xxxxxxxxx> said:
Hey Folks,
I am involved in a large image processing project at work. I am using
Lahey FORTRAN v6.2 pro on linux (I have 6 machines for sure and
possibly a whole lab full of diskless clients awaiting).
I'll be looping through large arrays (~36000000 by >6), basically
comparing each element with ALL other elements in the array. Initial
runs suggest it will take about 600 days to complete (a large part of
this is due to my programming; I'm working on optimizing the code)...
You provide no hint of why you need to compare ALL pairs. If these are in
any sense metric based then you need to find out about nearest neigbour
searching, range searching, associative searching and all the other
variants on the general theme that have been developed in the many years
since computer science became concerned with efficiency of use. The
abstract field is call concrete computational complexity. Searching
is a well worked subfield. Parallel and distributed searching have been
considered.
If you are doing metric comparisons then a typical algorithm of this
class will have an n log ( n ) setup cost with a log ( n ) cost for
each query. That makes finding the nearest neighbours of all elements
only n log ( n ). Even lousy code will run circles around a highly
polished n^2 code at your sizes.
Any effort on optimizing code done before you work on choosing better
algorithms is a total complete waste of time unless you are a salesman
selling machine time. It is hard to be more emphatic on the issue of
getting good algorithms before you try to optimize things.
If your comparisons are not metric based then maybe you need to try
for a more general notion of metric. If that also fails then run to talk
a complexity type to see what is known about whatever you problem is.
If you work on the alogithms then you will be able to move onto REALLY BIG
prblems and you will not even have to buy your own computer company.
Regardless of my programming skills, I need some guidance on what type
of distributed system I should implement. In a previous post on this
list, I was turned on to openMP. But I am open to learning MPI, PVM,
etc if that will facilitate getting the job done.
I guess my question to you folks working with large datasets is:
1) What standard have you implemented to network machines together?
2) What message passing standard do you use?
3) Other than 'turn back now', do you have any other guidance to how I
should set up the cluster?
I am open to any and all suggestions.
Thanks for your time.
Tripp
GIS Lab Manager
The Univ. of GA
WSFNR
.
- Follow-Ups:
- Re: I need some guideance regarding parallel processing
- From: glen herrmannsfeldt
- Re: I need some guideance regarding parallel processing
- References:
- Prev by Date: Re: I need some guideance regarding parallel processing
- Next by Date: Re: text-2-binary I/O switch issue
- Previous by thread: Re: I need some guideance regarding parallel processing
- Next by thread: Re: I need some guideance regarding parallel processing
- Index(es):
Relevant Pages
|