Re: Profiller for Linux



gpderetta wrote:

Noob wrote:

LinuxAsm wrote:

Does anyone know of a good code profiler for Linux?

I recommend oprofile over gprof.
http://oprofile.sourceforge.net/about/

Valgrind is extremely good for doing both source level
and assembler level profiling (you get execution count
for every single instruction!).

It also does cache (both data and instruction) and memory
allocation profiling.

Kcachegrind is the perfect companion for examining and
navigating valgrind output.

The downside of valgrind is that your program will run very
slowly (even 30x) when profiling. But I have found it incomparably
more useful than gprof. I have no experience with oprofile.

If a profiler is too intrusive (high overhead) then it is not profiling the application, but the combination of the application AND the profiler itself. The impact of cache misses will be incorrectly reported because the timing is different, and the profiler itself will induce extraneous data and cache misses.

cf. http://en.wikipedia.org/wiki/Observer_effect

Regards.

.



Relevant Pages

  • Re: profiling kernel modules.
    ... especially compared with other profiling techniques. ... Run pmcstat to begin taking samples(make sure that whatever you are ... pmc will take a sample every 64K non-idle CPU ... if you suspect that data cache misses are ...
    (freebsd-current)
  • Re: [RFC PATCH 00/19] Cleanup and optimise the page allocator V2
    ... thanks a lot for profiling this. ... The OLTP results had the following things to say about the page allocator. ... there doesn't appear to be in time spent in the allocator but due to ... Something like cache misses or ...
    (Linux-Kernel)
  • Re: [RFC PATCH 00/19] Cleanup and optimise the page allocator V2
    ... thanks a lot for profiling this. ... The OLTP results had the following things to say about the page allocator. ... there doesn't appear to be in time spent in the allocator but due to ... Something like cache misses or ...
    (Linux-Kernel)
  • Re: Profiller for Linux
    ... and assembler level profiling (you get execution count ... The impact of cache misses will be incorrectly reported because ... virtual cpu, cache included. ... execution of the profiled program, the profiler doesn't run inside it, ...
    (comp.lang.asm.x86)
  • Re: Memory allocation performance
    ... Are you effectively "cycling through" objects rather than using a smaller set that fits better in the cache? ... To check UMA dependency I have made a trivial one-element cache which in my test case allows to avoid two for four allocations per packet. ... They are quite a bit more expensive on several hardwawre platforms, and any environment it's safe to call uma_zallocfrom will be equally safe to use regular mutexes from. ... Profiling results I have sent promised close results. ...
    (freebsd-performance)