# Re: Improving multi-dimension array access performance

*From*: Glen Herrmannsfeldt <gah@xxxxxxxxxxxxxxxx>*Date*: Fri, 28 Nov 2008 01:03:45 -0700

Rajorshi Biswas wrote:

We have a large fortran code base which uses quite a lot of four

dimensional arrays. Upon profiling the performance, we've found that

the access times on these arrays is quite significant.

For instance, assume we have an array: arr(3,200,200,200). Our

question is:

a) Is there any well-known way of optimizing access to

such arrays in Fortran?

First, make sure that loops are nested such that the inner loops

correspond to the leftmost subscripts, if possible.

In cases where it isn't possible, such as matrix multiply, which goes

through arrays in different directions, doing it in blocks can help.

Loop unrolling can also help.

In many cases, the solutions aren't so general but require understanding

the specific details of the problem.

The subscript calculation should be approximately proportional

to the number of subscripts, and on most modern machines should

be faster than array access (especially store) that the actual

number of subscripts isn't the problem. Any operation on a

24 million element array will be slow if not done in the right order.

b) If we "unroll" the array arr into arr1(of dimensions 200x200x200),

arr2 and arr3, would the speed of access improve dramatically?

If you access all three in the same loop, the change won't be dramatic.

If by "unroll" you separate the arrays and loops it might be.

We are using Intel Fortran compilers, in case that matters.

Can you post the set of nested loops accessing the array?

Otherwise, testing with the following program shows that the

accessing a six dimension array takes about twice as long as

a one dimension array.

Accessing the six dimension array in the wrong order (the

second set of nested loops) takes three times as long as the

right order, or six times as long as the single loop.

real x(10,10,10,10,10,10), y(1000000),xx,yy

integer i1,i2,i3,i4,i5,i6,i

integer*8 rdtsc,t0,t1,t2,t3

call random_number(x)

y=transfer(x,y)

xx=0

yy=0

t0=rdtsc(0)

do i1=1,10

do i2=1,10

do i3=1,10

do i4=1,10

do i5=1,10

do i6=1,10

xx=xx+x(i6,i5,i4,i3,i2,i1)

enddo

enddo

enddo

enddo

enddo

enddo

t1=rdtsc(0)

do i=1,1000000

yy=yy+y(i)

enddo

t2=rdtsc(0)

print *,xx,yy

print *,y(12346),x(6,5,4,3,2,1)

print *,t1-t0,t2-t1

xx=0

yy=0

t0=rdtsc(0)

do i1=1,10

do i2=1,10

do i3=1,10

do i4=1,10

do i5=1,10

do i6=1,10

xx=xx+x(i1,i2,i3,i4,i5,i6)

enddo

enddo

enddo

enddo

enddo

enddo

t1=rdtsc(0)

do i=1000000,1,-1

yy=yy+y(i)

enddo

t2=rdtsc(0)

print *,xx,yy

print *,y(12346),x(6,5,4,3,2,1)

print *,t1-t0,t2-t1

end

rdtsc.s contains:

.file "rdtsc.f"

.text

.p2align 4,,15

..globl rdtsc_

.type rdtsc_, @function

rdtsc_:

rdtsc

ret

.size rdtsc_, .-rdtsc_

(assuming an IA32 processor)

-- glen

.

**Follow-Ups**:**Re: Improving multi-dimension array access performance***From:*Ron Shepard

**References**:**Improving multi-dimension array access performance***From:*Rajorshi Biswas

- Prev by Date:
**Improving multi-dimension array access performance** - Next by Date:
**Re: Improving multi-dimension array access performance** - Previous by thread:
**Improving multi-dimension array access performance** - Next by thread:
**Re: Improving multi-dimension array access performance** - Index(es):