Re: inline



Wilco Dijkstra wrote:
"David Brown" <david@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message news:46e1219c$0$27823$8404b019@xxxxxxxxxxxxxxxxxx
Wilco Dijkstra wrote:
"Bob" <SkiBoyBob@xxxxxxxxxx> wrote in message news:5k8enmF2i34vU1@xxxxxxxxxxxxxxxxxxxxx
karthikbalaguru wrote:
<snip>

extern inline function definition - Stand-alone object code is never
emitted.
normal inline function definition - Stand-alone object code is always
emitted
static inline function defintion - Stand-alone object code may be
emitted if required.

But still, At which practical scenario we need either of these ?

Where will be the practical use/application of these ( 'static inline
function' or 'extern inline function' or 'normal inline function' ) ?

Thx in advans,
Karthik Balaguru
pro: speed. cpu doesn't execute a call/return or push/pop instructions. In cached systems the inline code is already in the cache.

con: size. you duplicate the inline code as many times as you use the "function", thus, bigger.
Most compilers are smart about inlining and only use inline as a hint, so
large inline functions, or medium sized ones that are called frequently are
not inlined. Similarly small functions that are not marked inline may be
inlined automatically. In C++ such smart inlining significantly improves
codesize compared to always inlining or never inlining.

Actually, compilers will also happily inline large functions automatically - if they are only ever called once. That's one of the several advantages in using "static" unless a function really needs to be global.

It's generally a bad idea to inline large functions, even if they are static and
only called once. Apart from there being little gain possible in such cases,
register allocators work better on smaller functions with low register
pressure, so inlining big functions often results in worse code. There are
techniques that can reduce this effect, but they basically involve splitting
off regions with high register pressure and independently allocating each
region. Not inlining calls to large functions gives you this for free.


I'm inclined to leave the details of that to the compiler writers - if they think that their register allocators will have problems when working on very large functions, then they would not use automatic inlining on single-use large functions. gcc has dozens of command-line options for fine-tuning and testing this sort of thing (presumably so do other compilers, although they may be limited to internal builds) - I expect their automatic testing to give them reasonable values to balance the size of the combined functions with other factors (code size and speed, and compile-time memory usage and speed). For example, on cpus with large numbers of registers, such automatic inlining can make a big difference as the overhead for calling big functions will be more significant (pushing and popping more registers), whereas for register-poor cpus it makes little difference. I don't expect it to be perfect, of course - "optimisation" is a misnomer.

Back to the original question: use static inline when the function is defined
in a C file and static. Use inline when the function is defined in a header.
I think it's better to use "static inline" in both cases. Obviously if a function is defined in a C file, and only ever used in that module, then it should be "static" anyway. But I think "static inline" makes sense in headers too - you don't risk generating extra code unless the function can't be inlined for some reason (and gcc can warn you about that if you want), whereas with a plain "inline" you will get extra code. The only complication I can see is if you need static locals in the inlined function.

The problem is that if you use static inline in a header is that you may end up
with multiple out-of-line copies of the same function (and the behaviour is like
you wrote several independent copies rather than a single function called from
multiple places). For really small functions it doesn't really matter as you'd
expect them to be always inlined anyway.


You will only get multiple out-of-line copies if you use the static inline functions in a non-inlinable fashion (such as taking their address, or disabling inlining for debugging). As long as you don't force the compiler to generate the outlined copy, no such copies will be generated. If you find that your function is often used inline and outline, then you are probably better of re-structuring it into two versions of the functions.

In C99 you're pretty much forced to use static inline in headers, but in C++ it's
better to use plain inline. I'm not sure why you think this generates extra code -
the idea of it is precisely to only emit an out-of-line copy when required (so no
compilation resources are wasted), and if there are multiple copies they must all
be shared (so codesize is as small as possible).


That's because C++ "inline" is not exactly the same as C99 "inline" (to my understanding - which may be wrong, of course), and I was talking about C "inline" rather than C++. When C++ was developed, there were subtle changes to some of the default linkages so that a plain "inline" worked in an ideal fashion (as you described) - plain "const" data is similar. But when C copied the "inline" qualifier, it could not change the linkage defaults without changing the language (since "inline" is merely a hint in C, rather than a language feature) - thus you end up with "static inline". So for C++, plain "inline" is perhaps best while "static inline" is best for C (or for headers that are used by both languages).


Wilco


.



Relevant Pages

  • Re: Experiences using "register"
    ... Because it adversely affects code size is one thing I can think of ... inlining may even decrease the code size. ... The main purpose of inline is to take away the function call and return ... Again the standard inline keyword is like register: ...
    (comp.lang.c)
  • Re: powernow-k8-acpi driver
    ... -static inline u32 freq_from_fid(u8 fid) ... static inline u32 kfreq_from_fid ... -static inline int check_supported_cpu ...
    (Linux-Kernel)
  • [git pull] generic bitops
    ... this started out as improvements/generalizations to x86 bitops, ... * @addr: The address to base the search on ... -static inline long ... +static inline unsigned long __fls ...
    (Linux-Kernel)
  • [git pull] generic bitops, take 2
    ... * @addr: The address to base the search on ... -static inline long ... +static inline unsigned long __fls ... static inline int fls64 ...
    (Linux-Kernel)
  • [git pull] generic bitops, take 3
    ... UML seems unaffected and properly picks up these symbols from the x86 ... * @addr: The address to base the search on ... -static inline long ... +static inline unsigned long __fls ...
    (Linux-Kernel)