Re: Program speed execution question




"Richard E Maine" <nospam@xxxxxxxxxxxxx> wrote in message
news:nospam-0F7BD0.12463223042005@xxxxxxxxxxxxxxxxxxxxx
> In article <v61l611sl8b9p3v6djo4gv6uu3nhjif697@xxxxxxx>,
> Joe Hill <georgecostanz50@xxxxxxxxxxx> wrote:
>
>> We have a program that we are running on both Xeon 32 bit and Opteron 64
>> bit
>> cpus. The program runs much faster on the 32 bit Xeon processors. The
>> run
>> time
>> (wall clock) is as follows :
>>
>> Xeon 32 bit = .017 wall clock-hours
>> Opteron 64 bit = .309 wall clock-hours
>
> Some difference could be explained several ways, but that's a pretty
> darned big difference for any of the explanations. You say you were
> using the same compiler for both (or anyway, that's how I interpreted
> what you said), but maybe it is just the same version number. Anyway, I
> can't explain that part.
>
>> The internal customer then examined the code and changed the way arrays
>> are
>> allocated.
>>
>> Old Way : [pointers]
>> New Way : [allocatables]
>
>> Changing the array allocation decreased the wall clock time to almost
>> nothing
>> on
>> both types of cpus according to our internal customer.
>> Can anyone explain
>
> Others have talked about aliasing, but I'd guess that to be the wrong
> explanation here. Aliasing can be important, but I wouldn't expect to
> see changes quite as big as you describe except possibly in the most
> contrived special cases. However...
>
> I have personally seen *HUGE* differences between allocatable and
> pointer arrays because allocatables are known at compile time to be
> contiguous, whereas pointers are not. In some compilers, this causes
> unnecessary copy-in/copy-out operations. That can result in performance
> penalties that are almost arbitrarily large when huge arrays get copied
> around just to perform trivial operations on single elements.
Aliasing could account for as much as a factor of 5 in performance on the
Xeon, if it makes the difference between vectorizing or not. Not as much
difference on the Opteron, but still significant, for single precision. A
larger factor might come about, if temporary arrays were allocated in an
inner loop, but can be eliminated by optimization with the new declaration.


.



Relevant Pages

  • Program speed execution question
    ... We have a program that we are running on both Xeon 32 bit and Opteron 64 bit ... The program runs much faster on the 32 bit Xeon processors. ... Changing the array allocation decreased the wall clock time to almost nothing on ... both types of cpus according to our internal customer. ...
    (comp.lang.fortran)
  • Re: Program speed execution question
    ... using the same compiler for both (or anyway, ... > The internal customer then examined the code and changed the way arrays are ... > Changing the array allocation decreased the wall clock time to almost nothing ... > both types of cpus according to our internal customer. ...
    (comp.lang.fortran)
  • dgemm subroutine in BLAS - I think Ive cracked the difference, please confirm
    ... see what I'm talking about), so in the current state of memory allocation, ... but I have a problem when it comes to applying the dimensions. ... > In the Fortran 66 days it was because dynamic allocation didn't exist, ... > you allocate arrays big enough for the largest problem you might run into. ...
    (comp.lang.fortran)
  • Re: Copying allocatable member
    ... obvious that allocation was happening by forcing the user to write it ... vendors being overly defensive about criticisms from clueless users who ... time imagining it being significant unless the arrays in question are ... reminded of some of the nasty cases that Henry Zongaro used to construct ...
    (comp.lang.fortran)
  • Re: Fortran memory allocation (stack/heap) issues
    ... > rather than Fortran, ... dynamic allocation, and relatively little stack allocation. ... value return and arrays by reference. ...
    (comp.lang.fortran)