Re: asm volatile

Wilco Dijkstra wrote:
"David Brown" <david@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
Chris H wrote:
In message <47a7a6b0$0$23838$8404b019@xxxxxxxxxxxxxxx>, David
Brown <david.brown@xxxxxxxxxxxxxxxxxxxxxxxxxx> writes
Chris H wrote:
In message <60pbeoF1rnfs9U1@xxxxxxxxxxxxxxxxx>, Nils
<n.pipenbrinck@xxxxxxxxx> writes
aamer schrieb:
Dear all, I was wondering what does "asm volatile" do. Best regards

asm volatile is gcc specific syntax (why does everyone else
here makes stupid jokes about this? It's a valid question,

Only for gcc. As pointed out it is certainly not even
vaguely sensible C

Most (all?) embedded C compilers support inline assembly,

and many have some way of indicating a particularly "volatile"
assembly passage. The syntax will vary, but the principle is
the same.

and imho a "must know" for an embedded engineer. Shame on
ABSOLUTELY NOT. I have been doing embedded for 30 years and
never seen it.
Are you telling us that all the embedded compilers you have
used are so much behind gcc
No generally far more advanced.

in this area that they have no way to distinguish between
"volatile" inline assembly passages and "optimisable" assembly
passages that can, for
Words fail me.
Then let me try to help.

Suppose your processor has support for instructions that have no
direct equivalent in C - rotations, DSP-type instructions, etc.,
and you want your code to be as small and fast as possible. For
example, you might define an assembly function or macro "satadd(a,
b)" which does a saturated addition of two numbers. With
optimisable inline assembly, the compiler would know that this is a
"const" or "pure" function - one whose results depend entirely on
its inputs, and has no side effects.

An intrinsic would be better for this particular example.

Where appropriate and available, intrinsics are often better (this was merely a simple hypothetical example) - although they are normally restricted to single assembly instructions while your asm function/macro may cover more. Even better, of course, is when the compiler can recognise patterns in the C source and generate code directly (examples would be good optimisation of dsp-style code).

Thus the compiler is able to move calls to this function/macro
around, such as pulling constant calls outside of loops or branched
code, and avoiding executing the code more than strictly necessary.
The compiler is also able to provide the assembly code with
registers that it may use - that way, there is no need to save or
restore registers unnecessarily around the assembly code.

So any compiler that is able to handle inlined assembly at least as
well as gcc must have some way to distinguish between assembly
sections that *can* be optimised and moved around in this way, and
those that cannot (unless it directly analyses the effects of the
assembly code, which is pretty unlikely).

Actually analysing the effect of the assembly code is the only
correct way of dealing with inline assembler. No other approach can
be called "optimizable inline assembler". Basically inline assembler
and compiler optimization are mutually exclusive things - unless the
compiler understands the effects, constraints, timing, size of each
instruction. Most inline assembler syntax, including GCC, is not
about optimization of inline assembler but an effort to reduce the
negative effect of inline assembler on the rest of the code.

The term "optimise" is an abuse in itself - the compiler does its best to produce better code, but you've no guarantees of truly "optimal" code. But it's fair enough to describe the inline assembly features as reducing the negative consequences of the asm statements, rather than optimising them as such - the asm code itself is not changed by the compiler (except the registers used, and possibly addressing modes, may change). The point of getting the gcc asm syntax accurate is to allow the optimiser to generate the best possible assembly for the C code around the inline asm code, as well as to be able to move or remove the asm statements when possible.

I designed an inline assembler which is 100% optimizable and
seamlessly integrates with C/C++ (each operand can be a C
expression). This means each inline assembler instruction is treated
like any other instruction and subject to exactly the same set of
optimizations. All this happens automatically without needing any
arcane syntax.

Eventhough it is more than 10 years old now, it is still the most
advanced inline assembler you'll ever see. It upset quite a lot of
people as the compiler was able to optimise inline assembler
instructions when they expected it to behave like a real assembler.

It's certainly possible to do this (obviously, since you've done it!). If nothing else, many assembly instructions can be directly translated into C (with registers replaced by local variables), and the C optimiser will do a reasonable job. Of course, there is normally little point in doing this - if the code can be written in C in the first place, you don't want to use inline assembler. The use of inline assembly is mainly for constructs or instructions that don't translate well into C.

I'd imagine messing with the assembly would upset some people - I've regularly met people on mailing lists who insist on turning off optimisation on their compiler because they don't like the compiler being "smart", and believe that the compiler should (and will) do exactly as they say. When people get like that about their C code, they'll be a lot worse about their assembly!

I don't have detailed information on more than a couple of high-end
commercial compilers on hand, and neither of them support
optimisable asm statements. Thus all their "asm" statements and
expressions are, in gcc terms, "asm volatile". The manuals
stressed that inlined assembly would block the optimiser, and that
you probably want to compile such code without optimisation at all.
In other words, their support for inline assembly is not nearly up
to the levels of gcc. On the other hand, they did have more
pre-defined intrinsic functions which can avoid the need for inline
assembly in some cases.

All inline assembler that isn't analysed by the compiler is by
definition "asm volatile". It has a negative effect on the rest of
the code as the compiler never knows exactly what it does (eg.
changing the stack pointer!).

The compiler *does* know a fair amount about the assembly - based on what you tell it (with gcc asm constraints). In particular, it *does* know if you are changing memory, or registers, and it knows if you are causing other side effects. For example, it knows when your code has no side effects (when your asm statement is not "volatile", and it has an output). It also knows that you are not changing the stack pointer (except temporarily), or other registers - because you'd include them in the clobber list if you were. There are other things it does not know, such as the size or timing of the assembly section, and there are limits to the descriptions (you either clobber all memory, or no memory - you can't tell the complier you're only clobbering some of it).

There are two places where you need it. First: If you write
inline asm-code
That's a silly thing to do any way.
Ah well, silly me. And silly compiler writers (a great many of
them) that wasted all that time and effort adding features to
allow silly programmers to write silly inline assembly code.

Chris has a point here: many uses of inline assembler are
unnecessary. Intrinsics are a far better way of using additional
instructions, and when you do need to squeeze the last few percent of
performance, it's best to avoid the complexity and drawbacks of
inline assembler and use a real assembler.

As I said above, intrinsics are often a better choice - when they are available, and when they are appropriate. However, inline assembly in gcc allows you to express things reliably and completely (though the syntax can be a bit daunting), and means that low-level code can often be expressed by a source or header level inline assembly definition, rather than a compiler level intrinsic. For example, on the ColdFire, there are a number of special registers that are used for low-level setup such as setting the base address of the internal RAM, or cache control. These registers must be accessed by special assembly codes. On gcc, this is done easily with inline assembly, while on other compilers it is perhaps done with specific intrinsics for each special register. When a new ColdFire comes out, with a different internal address for the same named special register, the intrinsic-based compiler must be modified to support the new core - while using inline assembly, it is the program's source code that changes. From experience of exactly this situation, I know which system I find better - buy an update to the commercial compiler, or modify a single line of source code?

Overly using inline assembly, especially in the name of "optimisation", is silly, of course. But appropriate use is a good thing - it can mean that traditional assembly code such as C startup routines can be written in C with a couple of lines of inline assembly, rather than pure assembly.