Re: efficiency change by dynamic allocation in microprocessor



Thank you for your kind advice.

I have been programming matrix library with fixed size array.

In example, I have made a structure like this:

typedef struct {
int row;
int col;
double M[1000];
} Matrix;

And I have used matrix as follows:

Matrix m_A, m_B, m_C;
...
void function_name(Matrix &A,Matrix &B)
{
...
}
void main(void)
{
...
function_name(&A,&B);
...
}

Of course, I am afraid of increasing of cache miss rate caused by a lot of
matrix structure used (with big -1000- size of array).

Do you mean should I use just one array such as A[100000].
and then I should utilize the adequate portions of array for each matrix
using index technique?

or my mehtod is right?

regards.
J.J.

>This of course depends how often the new/delete is called. If
>temporary objects are created for each matrix operation, the dynamic
>memory handling could easily consume most of the time. The advantage
>with allocating arrays of the correct size is that all elements are
>close together and with cache and/or virtual memory, quite high hit
>rates can be achieved with few cache conflicts.
>
>While a large fixed 2-D array does not have run time overhead.
>However, if only part of each line and column is used, the actual data
>is distributed all around the big array into small segments. This can
>be bad for the cache performance. At least make sure that each
>segment starts at the beginning of a cache lane.
>
>Especially with primitive cache algorithms, such as the single
>associative 1 to 1 mapping, a physical address can only be loaded into
>a specific cache address. If the fixed 2-D array allocation is done
>carelessly, two adjacent lines or columns, which are in wildly
>separate memory segments could map to the same cache address, causing
>frequent reloads and dramatically dropping the performance. Using
>array dimensions that are _not_ a power of 2 usually avoids this
>problem. Anyway, it is important to know how the cache system works on
>a specific processor and design the table allocation accordingly.
>
>One way around this would be to allocate just a big 1-D vector and at
>each matrix reference calculate the actual 1-D element index. In this
>way all the used elements would be close together. For each matrix,
>define a macro
>
>#define A(i,j) a[(i)*a_dim1+(j)]
>
>To compare the performance with fixed 2-D allocation or dynamically
>allocated 2-D, simply redefine
>
>#define A(i,j) a[i][j]
>
>or even
>
>#define A(i,j) this->a[i][j]
>
>
>By defining macros for each matrix,you could always write the program
>nearly in good old FORTRAN style :-) e.g. like
>
> C(1,4) = A(2,4) + B(3,1) ;
>
>After performance tests the best implementation could be selected,
>without making the program hard to read.
>
>Paul
>
>



This message was sent using the comp.arch.embedded web interface on
www.EmbeddedRelated.com
.



Relevant Pages

  • Re: efficiency change by dynamic allocation in microprocessor
    ... >I would like to know whether it is good to make matrix class using dynamic ... rates can be achieved with few cache conflicts. ... While a large fixed 2-D array does not have run time overhead. ... If the fixed 2-D array allocation is done ...
    (comp.arch.embedded)
  • Re: best approach in order to experimentally confirm performance of shells ort?
    ... I would count the number of moves of elements in the array and the ... cache, that will go very quickly, but as soon as you hit a point ... he may see this as a discontinuity in the graph. ... Other than CPU time, it isn't time consuming. ...
    (comp.programming)
  • Re: repeater
    ... Regarding Cache, you would not store the answers in the Cache. ... If you use an array (not just to save the answers, ... > The reason why I used a repeater to display my questions is because my ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: Thats not the Answer to Your Question
    ... You are probably unwittingly accessing off the end of the array and ... The code has an avoidable multiple access to the same cache line in ... loop in a more cache unfriendly way. ... make this trivial optimisation (even some basic interpretters can ...
    (sci.image.processing)
  • Re: Can this type of cache miss be reduced?
    ... as loop interchange, loop blocking, etc. ... But, for a large one-dimensional array, suppose the elements are only ... the funis doing, and your definition of "cache miss." ... compilers could insert extra code into loops to do ...
    (comp.compilers)