Re: Improving multi-dimension array access performance



Rajorshi Biswas wrote:

We have a large fortran code base which uses quite a lot of four
dimensional arrays. Upon profiling the performance, we've found that
the access times on these arrays is quite significant.

For instance, assume we have an array: arr(3,200,200,200). Our
question is:

a) Is there any well-known way of optimizing access to
such arrays in Fortran?

First, make sure that loops are nested such that the inner loops
correspond to the leftmost subscripts, if possible.

In cases where it isn't possible, such as matrix multiply, which goes
through arrays in different directions, doing it in blocks can help.

Loop unrolling can also help.

In many cases, the solutions aren't so general but require understanding
the specific details of the problem.

The subscript calculation should be approximately proportional
to the number of subscripts, and on most modern machines should
be faster than array access (especially store) that the actual
number of subscripts isn't the problem. Any operation on a
24 million element array will be slow if not done in the right order.

b) If we "unroll" the array arr into arr1(of dimensions 200x200x200),
arr2 and arr3, would the speed of access improve dramatically?

If you access all three in the same loop, the change won't be dramatic.

If by "unroll" you separate the arrays and loops it might be.

We are using Intel Fortran compilers, in case that matters.

Can you post the set of nested loops accessing the array?

Otherwise, testing with the following program shows that the
accessing a six dimension array takes about twice as long as
a one dimension array.

Accessing the six dimension array in the wrong order (the
second set of nested loops) takes three times as long as the
right order, or six times as long as the single loop.

real x(10,10,10,10,10,10), y(1000000),xx,yy
integer i1,i2,i3,i4,i5,i6,i
integer*8 rdtsc,t0,t1,t2,t3
call random_number(x)
y=transfer(x,y)
xx=0
yy=0
t0=rdtsc(0)
do i1=1,10
do i2=1,10
do i3=1,10
do i4=1,10
do i5=1,10
do i6=1,10
xx=xx+x(i6,i5,i4,i3,i2,i1)
enddo
enddo
enddo
enddo
enddo
enddo
t1=rdtsc(0)
do i=1,1000000
yy=yy+y(i)
enddo
t2=rdtsc(0)
print *,xx,yy
print *,y(12346),x(6,5,4,3,2,1)
print *,t1-t0,t2-t1
xx=0
yy=0
t0=rdtsc(0)
do i1=1,10
do i2=1,10
do i3=1,10
do i4=1,10
do i5=1,10
do i6=1,10
xx=xx+x(i1,i2,i3,i4,i5,i6)
enddo
enddo
enddo
enddo
enddo
enddo
t1=rdtsc(0)
do i=1000000,1,-1
yy=yy+y(i)
enddo
t2=rdtsc(0)
print *,xx,yy
print *,y(12346),x(6,5,4,3,2,1)
print *,t1-t0,t2-t1
end

rdtsc.s contains:

.file "rdtsc.f"
.text
.p2align 4,,15
..globl rdtsc_
.type rdtsc_, @function
rdtsc_:
rdtsc
ret
.size rdtsc_, .-rdtsc_

(assuming an IA32 processor)

-- glen

.



Relevant Pages

  • Some advice from the experts (large data processing)...
    ... I am processing satellite imagery using Lahey Fortran 95 pro on Linux ... The data arrays holding the imagery are very large ~ 36 million by 6 ... It loads with RAM to spare. ...
    (comp.lang.fortran)
  • Re: Program slowdown when calling function with dynamic arrays
    ... after I changed my program from using fixed arrays to dynamic arrays ... subroutine init ... The first method is about 15-20% slower (on two measurements, ...
    (comp.lang.fortran)
  • formatted output
    ... I need to print few arrays in a tabular form for example below array IL has 25 elements, is there an easy way to print this as 5x5 comma separated table? ... in fortran I could do it as below ...
    (comp.lang.python)
  • Re: Multidimensional arrays in Java.
    ... like `stud' and the thing it refers to. ... Java arrays and C arrays, ... In C++ array is always one dimension array which could be ...
    (comp.lang.java.programmer)
  • Re: 3d array in javascript
    ... I think that Rob knows, without reading further, what this post says but a is not a "two dimension array" as it is a simple array that has arrays as members. ... Javascript arrays are linear in fashion and as such you can't have multi-dimensional arrays. ...
    (comp.lang.javascript)