Re: LEN_TRIM performance issue



On 2006-12-20 11:45:36 -0400, olof.liungman@xxxxxxxxx said:

Thank you everyone for valuable output. Here are some comments and
aswers to questions, as well as a follow-up question.

For numerical code, it would be unusual for LEN_TRIM, or character
manipulation in general, to be in the inner loop. How much real time,
instead of the percentage, is this taking?

In the test run I profiled, which actually pretty well represents an
actual run (the code is used in an operational oil spill forecasting
system with a webb-interface; Seatrack Web), the total run time was
about 11 sec and the time spent in LEN_TRIM and CPSTR was about 2 sec
each.

The code consists of several nested loops, which depend on different
time scales in the problem. For example, the forcing input (the fluid
flow field) is read from files at specific intervals but an inner time
loop depends on events occurring, such as a particle reaching the wall
of a computational grid cell. Likewise, internal particle processes
(such as chemical changes in the substance represented by the
particles) may have another time scale, and thus another time step.

In many of the different processes involved (sinking/rising,
turbulence, evaporation, dispersion, etc.) we need to check what
substance it is, read in properties from files, etc. Some of these
control parameters are stored as strings in a derived type. Without
thinking more about this we have used LEN_TRIM to determine the
non-blank length of the part of the character strings that we want to
compare. In short, in some instances LEN_TRIM is indeed within nested
loops.


Many numerical algorithms work on arrays with two or more dimensions,
inside nested loops, such that the inner loop statements get executed
millions or billions of times. That would be unusual for control
parameters and file names. How does the time spent in LEN_TRIM and
CPSTR scale with problem size? Are you using a smaller problem
for testing than usual?

As stated above I profiled a typical run, not a reduced test run.
However, I have not tried running a shorter or longer forecast to see
how the time spent in LEN_TRIM and CPSTR scales. Thanks for the
suggestion.

Herre's a follow up question. gprof seems unable to determine the
"parent" or "child" function of built-in functions such as LEN_TRIM.
Thus, the call graph does not tell me which LEN_TRIM statement is the
major culprit. Does this info about the call graph help to explain
this?

"granularity: each sample hit covers 4 byte(s) for 0.09% of 11.39
seconds"

Any suggestions for how I should identify the specific LEN_TRIM calls
that use up the most time?

If I want to replace LEN_TRIM in a consistent way, my first idea is to
create a derived type that contains two fields: the character string
and the length of non-blank chars in the string. As most character
strings are constants initialized only ones, this would greatly reduce
the number of LEN_TRIM calls (and thus CPSTR calls, I suppose...). Any
comments?

//Olof

In an applicaction where I have text names for objects and attributes of the
objects I read all the text in and enter the names in a dictionary. A name
can then be replaced by an index into the dictionary. A table of attributes
is then just an ordinary array of data. The general stategy of remembering
the result of work that has been done previuosly is called "tabling of
function values". Converting text names to indices is a simple example.
Converting text input to tables is a larger example that is so automatic
that most folks would not even think of it as a performance enhancing
trnasformation.

Profiling has the problem of not knowing how to assign work back up the
call nesting list. A long time ago in a Fortran 66 execution profiler
I had such a capability. But it relied on instrumenting the code and
compiling the instrumented code. It was then able to know when routines
were entered and exited. With that information one could report both the
work done locally and all the work between the enter and the exit. That
facility became known as the "Bell profiler" as many folks first saw it
in action at the Murray Hills site of Bell Labs. As an instrumenter it
was capable of exact line counts as well as timing. A synthetic "clock"
of the "Fortran machine" which matched the exact line counts was often
as useful as the timings. The timer had considerable overhead so tended
to distort the times for low level utilities.

The notion of instrumenting and line counting was copied from a Knuth
student, Don Ingalls. That tool had poor reporting and did not try to
look at timing or the problems of call nesting. There was an F77 execution
profiler as part of the SoftTools (NAGTools?). I believe that plusFORT(?)
has a similar capability. I have never seen enough detail on either to
know how they deal with call nesting.

Most "profilers" seem to be location counter samplers that rely on load
maps. Figuring out a call nesting would be a lot of work so it not done.
The advantage is that they do not need to know about which language is
involved as they are basically assembly driven. Sampling has its own
set of problems. Location counter samplers have lots of benefits but
calling them "profilers" has managed to lose/hide a lot of what the true
profilers can do.



.



Relevant Pages

  • Re: LEN_TRIM performance issue
    ... to be in the inner loop. ... time scales in the problem. ... control parameters are stored as strings in a derived type. ...
    (comp.lang.fortran)
  • Re: LEN_TRIM performance issue
    ... to be in the inner loop. ... control parameters are stored as strings in a derived type. ... before doing a compare operation, you might not need to use LEN_TRIM ...
    (comp.lang.fortran)
  • Re: C# coding guidelines: use "this." or not when referring to member fields/properties within the
    ... Alphabetic for strings isn't quite so ... One example of where people go wrong is when they want to optimise loop ... implementation so that each iteration takes 10% less time will only ...
    (microsoft.public.dotnet.languages.csharp)
  • RE: Error handling in a Do Loop
    ... I made the changes you suggested to my strings - blanking them correctly now. ... have any further suggestion or pointers, ... > When loop is able to ping a computer and able to pull the information, ... when the loop does not ping a computer/unable to ...
    (microsoft.public.windows.server.scripting)
  • Re: Stupid Newbie Needs Help
    ... Without the loop the program works fine with the ... with 0-terminated strings, that way you can take advantage of C's ... hold up to 10 tokens, where each token may be up to 80 characters ... should give a clue of how the variable/constant/function/macro is ...
    (comp.lang.c)