Re: OpenMP "not working" on gfortran



On Feb 14, 10:07 am, Paul Anton Letnes <paul.anton.let...@xxxxxxxxx>
wrote:
Den 14.02.11 13.58, skrev gmail-unlp:





On Feb 14, 9:18 am, Paul Anton Letnes<paul.anton.let...@xxxxxxxxx>
wrote:
Den 14.02.11 12.10, skrev gmail-unlp:

On Feb 12, 6:00 am, Paul Anton Letnes<paul.anton.let...@xxxxxxxxx>
wrote:
Den 11.02.11 21.29, skrev FX:

Has anyone experienced similar issues with OpenMP, or can you suggest
somewhere to start looking for the trouble?

Try a simple OpenMP program (like a single for loop), see if you can
reproduce the issue with that, then post here your compiler version
(output of "gfortran -v") and the exact code and command line you are
using.

This is part of the problem: the simple OpenMP loop runs on all cores.
Also, the OpenMP code I wrote works on Mac OS X gfortran 4.5.2. The
OpenMP code also works flawlessly on Rocks cluster linux with intel
fortran v. 11.

I have now tested on gfortran 4.4.5 and 4.5.1 on Fedora Linux release
14, and both of these compilers result in no speedup. I know I can't
expect anyone here to find my problem (as I have no idea where to look
myself and a simple program doesn't reproduce the error), but it would
be interesting to see if someone here has had the same experience.

Paul

Hi,

Can you post the single loop so I can compile-link-run?

Fernando.

Sure! Just see below. The outer loop in a double loop is
OpenMP-parallelized. The function generate_element contains a function
generate_j which also contains a loop, so significant work is done even
in the innermost loop. Unfortunately, this loop is way too short (only
10 iterations) to usefully parallelize. I guess it would be easier to
debug that way, though.

Cheers,
Paul.

----------------------
subroutine setup_LHS(...)
! LHS is the left hand side of an equation system

(snip)

!$omp parallel do&
!$omp private(icol, irow, til, ip1, ip2, fra, iq1, iq2, alpha_0)&
!$omp firstprivate(qs, numerics, imin, imax, reverse_lookup, alpha_0s,
alphas)&
!$omp shared(LHS, fft_of_zetas)
do icol = imin, imax
      if (icol == 1) then
          print *, 'omp num threads:', omp_get_num_threads() ! TODO Remove
      end if
      fra = reverse_lookup(icol, 1)
      iq1 = reverse_lookup(icol, 2)
      iq2 = reverse_lookup(icol, 3)
      alpha_0 = alpha_0s(iq1, iq2)
      do irow = 1, size(LHS, 1)
          til = reverse_lookup(irow, 1)
          ip1 = reverse_lookup(irow, 2)
          ip2 = reverse_lookup(irow, 3)
          LHS(irow, icol) = generate_element(pm_lhs, fra, til, iq1,&
              iq2, ip1, ip2, numerics, alphas(ip1, ip2),&
              alpha_0, fft_of_zetas, qs)
      end do
end do
!$omp end parallel do

Sorry, I don't have enough time to "complete" the code with
declarations/initializations/etc., that's why I need (and asked for)
some code I can directly compile-link-run-play almost directly from
the command line. Maybe you can just assign a constant to LHS just to
check for parallel behavior...

Looking the previous posts I was thinking in the line of OpenMP
implementation/OS problems, did you see something related to taskset?

Fernando.

I see, and I understand. However, the code is a bit complex. This is
probably part of the reason for the strange behavior! As mentioned
earlier, a simple do loop example compiles, runs and speeds up as expected.

I will consider your advice with respect to just assigning a constant to
LHS and see if that works out. I think it could be helpful!

I have never used taskset, so I have no idea how I would go about using
it in this context.

Paul.

I see, no problem, I understood from a previous post (I copy the text
here):
This is part of the problem: the simple OpenMP loop runs on all cores.
Also, the OpenMP code I wrote works on Mac OS X gfortran 4.5.2. The
OpenMP code also works flawlessly on Rocks cluster linux with intel
fortran v. 11.
I have now tested on gfortran 4.4.5 and 4.5.1 on Fedora Linux release
14, and both of these compilers result in no speedup.
that the simple loop did not work on Fedora, my mistake, sorry.

About code complexity and parallel performance, I think code
complexity only "affects" correctness, not parallel usage to only one
core, but obviously I don't know a looooot of details/facts.

About taskset: since I was thinking in OpenMP implementation/OS
problems, I suggested taking a look at taskset, which is used to
"retrieve or set a process's CPU affinity" -copied from
http://www.unix.com/man-page/Linux/1/taskset/
Since threads (I think always) inherit scheduling properties from the
process in which they are created, maybe you can verify/play with the
process scheduling properties. But it's just a guess...

Fernando.

.



Relevant Pages