Re: Which GCC Version to use with ARM7 ?



John Devereux wrote:
David Brown <david@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx> writes:

For smaller programs, gcc 4.1 has the potential to produce smaller and
faster code by compiling the entire program at once, letting it do
inter-procedural optimisations even across modules.

Something related to this that I found makes a big difference for me
is the compiler switches:

-ffunction-sections -fdata-sections -Wl,--gc-sections

This puts every function and every data object into its own
section. The -gc-sections link option then strips out sections that
are not used. This happens even if they are global and appear in the
same module (source file) as items that *are* used.

You also need to modify the link control file changing

*(.data) to *(.data.*)
and
*(.text) to *(.text.*)

To pick up the modified section names.

This then allows you e.g. to write libraries with lots of extra
functions in them, many of which functions might not get used in every
application.

This seems to work fine in 3.4 (as well as 4.1, presumably).


Yes, this works (on most gcc targets) for modern gcc versions. The fun win gcc 4.1 comes when you use the "--combine" and "-fwhole-program" options (along with -O2 or -O3 optimisation). The --combine option tells the compiler to take all the C files on the command line together and compile them at once, including doing inter-procedural optimisations. The -fwhole-program flag can be thought of as creating a new scope level between global and file static, with ordinary global or extern data falling in this level. Only "main" and explicitly declared "externally_visible" items are now at the true global level. Thus the compiler knows all uses of ordinary global data and code, and can optimise appropriately.

For example, supposing you have a function in a file "uart.c" such as:

void setBaud(unsigned int newBaud) {
unsigned int divisor = (osc / 16) / newBaud;
divLoReg = (divisor & 0xffff);
divHiReg = (divisor >> 16);
}

with "osc" being defined as a constant in a different module. Another module, say "protocol.c" calls this function as "setBaud(19200)".

In many cases, the setBaud function is only ever called from one place in the program, and with a constant value. Yet the compiler must generate the full function, and use an expensive division operation even though all the values are known at compile time. The traditional way to improve this is by making setBaud a macro or, better, a static inline function.

Using the "-combine" option, if uart.c and protocol.c are compiled at the same time, the compiler can inline the definition of setBaud into the implementation in protocol.c, and reduce the whole thing down to a couple of memory operations. The code for the setBaud function is still generated, of course, which is a waste of space. It can be removed using the "-ffunction-section" method described by John above, or by using the "-fwhole-program" flag which lets the compiler figure out that it doesn't have to generate code for setBaud at all.

Obviously a function like this one, which is called once, is not time-critical - but the principle applies.

That's the theory, anyway - I don't know how well it works in practice other than for a simple test case on the Coldfire.

mvh.,

David
.