Re: performance of accessing array using indexes or pointers



On Wed, 12 Apr 2006 13:16:41 +0200, Toon Knapen <spamtrap@xxxxxxxxxx>
wrote:

For a vector and matrix library project, I wish to evaluate the
performance difference of looping through vectors (i.e. arrays) using an
index or using pointers in C. There are many different opinions on this
issue and there have been many debates on this in the C/C++ world so I
decided to take a look at the assembler generated by the compiler using
the two different access methods.

(If somebody can point me to previous work on this issue I would be
happy to get a pointer to this work)


A good starting point:
http://webster.cs.ucr.edu/AoA/index.html

Look under stack frames. Look below for my answers.

So here are the two respective C routines

<findex.c>
int findex(int* array, int size)
{
int result = 0 ;
int i = 0 ;
for( ; i < size ; ++i) {
result += array[i] ;
}
return result ;
}
</findex.c>

<poiner.c>
int point(int* array, int size)
{
int result = 0 ;
int* it = array ;
int* end = array + size ;
for( ; it != end ; ++it ) {
result += *it ;
}
return result ;
}
</pointer.c>

gcc 3.4.3 with the -O3 option generated following code for each

<findex.s>
.file "index.c"
.text
.p2align 4,,15
.globl findex
.type findex, @function
findex:
pushl %ebp
xorl %eax, %eax
movl %esp, %ebp
pushl %ebx
movl 12(%ebp), %ecx
xorl %edx, %edx
movl 8(%ebp), %ebx
cmpl %ecx, %eax
jmp .L10
.p2align 4,,7
.L12:
addl (%ebx,%edx,4), %eax
result += array[i] ;

the access is done at 32 bits therefore 4 is required.
ebx is base for array while edx is the index.
~array[index * 4]
access is always done at byte level.



incl %edx
increment the index. edx is the variable i.

cmpl %ecx, %edx
i < size
ecx is the variable size. look at the source code above
to determine what is assigned to ecx.

.L10:
jl .L12
popl %ebx
popl %ebp
ret
.size findex, .-findex
.section .note.GNU-stack,"",@progbits
.ident "GCC: (GNU) 3.4.3"
</findex.s>

<pointer.s>
.file "point.c"
.text
.p2align 4,,15
.globl point
.type point, @function
point:
pushl %ebp
xorl %ecx, %ecx
movl %esp, %ebp
movl 8(%ebp), %edx
movl 12(%ebp), %eax
leal (%edx,%eax,4), %eax
jmp .L10
.p2align 4,,7
.L12:
addl (%edx), %ecx
result += *it ;
edx is * it
while ecx is result
addl $4, %edx
increment the pointer for var it.
.L10:
cmpl %eax, %edx
it != end
compare both pointers.

In conclusion. the second src code SHOULD be faster a bit b'cos there
is no translate for the memory access shown here.
addl (%edx), %ecx
while the first src code has to decode the instruction for the memory
access
addl (%ebx,%edx,4), %eax

There are other issues to worry about like mem alignment, cache misses
and etc. You may want to go through this book:

1)Code Optimization: Effective memory usage.



regards,
john


jne .L12
popl %ebp
movl %ecx, %eax
ret
.size point, .-point
.section .note.GNU-stack,"",@progbits
.ident "GCC: (GNU) 3.4.3"
</pointer.s>

The loop using indexing consists mainly of:

addl (%ebx,%edx,4), %eax
incl %edx
cmpl %ecx, %edx
jl .L12

while using pointers it consists of

addl (%edx), %ecx
addl $4, %edx
cmpl %eax, %edx
jne .L12

so the main loop consists in both cases of 4 instructions.

The question now is: Is there a reason to assume that one of these would
be executed faster as the other? Is there a way to verify that both
execute in the same number of clock-ticks? And if there is a performance
difference, given that both methods are equivalent in C, why does'nt gcc
generate the same code in both cases (although I probably better ask
this in a gcc related newsgroup but I first wanted more info on the
equivalence (or not) of the assembler listings) ?

Thanks a lot in advance,

toon

.



Relevant Pages