Re: Parallel programming with FORALL?

On Jan 12, 12:18 am, Dick Hendrickson <dick.hendrick...@xxxxxxx>
Yes, that's the way it works.  One statement is completely processed
before the next one is started.  (Of course, the compiler can optimize
around this if things are safe.)  This is different from DO loops.  If
your FORALLs were naively replaced with DO loops you'd get a completely
different answer.

Yeah. I wrote it intentionally like that (mixing indices) because I
wanted to see that feature in action. It's interesting that you can do
things like that. Like, you could implement a transpose with:

forall (i = 1:N, j=1:N) B(i,j) = B(j,i) end forall

I think it's an interesting feature.

The F2008 DO CONCURRENT is sort of a replacement for FORALL.  The
programmer guarantees that there is complete iteration-to-iteration
order independence in the loop.  The compiler can freely spread the
execution across many CPUs.  The loss is that DO CONCURRENT wouldn't
work for your example.

Sounds interesting. Too bad it's not implemented in GFortran.


Relevant Pages

  • Re: Rules for "colon matching" in array operations
    ... < "scalarization pass inside of the compiler". ... Similarly, Andy talks about loops, but not ... Any routine using ENTRY compiles as one function ...
  • Re: Support for optimization for dual core proc in C++
    ... Does the C++ in VS2005 allow optimization to parallelize and vectorize ... automatically like the product IntelC++ Compiler Version 10.0 does? ... restructures and optimizes loops to ensure that auto-vectorization, ...
  • Re: Why is C# 450% slower than C++ on nested loops ??
    ... A nested loop written the way it is in the benchmark is measuring nothing but a compiler's ability to optimize nested loops that do more or less nothing. ... C# compiler generates such code, and the C++/CLI compiler is able to do so. ... It is reported that 2005 does a much better job of optimization of .NET code, yet, only with the C++, not the C# compiler. ...
  • Re: File IO
    ... > (Actually, from your description, the implicit DO loops are working ... your problem is when you read array slices instead of implicit DO ... so the compiler might creat a temporary ... that's probably a bug in the I/O design. ...
  • Re: How much tuning does regular lisp compilers do?
    ... | question of how realistic such improved "cached aligned" loops ... In my conversations with people who *are* experienced compiler ... expect that on modern x86 machines that the penalty ... And since cache lines are aligned ...