Re: Interesting article by Randall Hyde




Robert Redelmeier wrote:
> randyhyde@xxxxxxxxxxxxx <spamtrap@xxxxxxxxxx> wrote:
> >> a test before executing the loop. That test is required
> >> by K&R App A "c reference manual" 9.6
> >
> > What makes you think the test is required?
>
> The words "; the second specifies a test, made before each iteration,
> such that the loop is exited when the expression becomes 0;"
>
> That sounds like a requirement to me.

No. It is not a requirement. Language specifications are pretty good
about specifying semantics and leaving the actual code generation up to
the compiler's implementor. For example, is a compiler writer forced to
compare the value one against zero in the following for loop:

for( ;1; )

And what happens in the canonical infinite loop in C?

for(;;)

The statement you quote above discusses the C *source* language, not
the code that the compiler generates.

And another thing you might consider is that compiler optimization was
in its infancy when K&R was written, and they probably weren't even
thinking about it at the time, or they might have been a bit more
precise in their book describing the C language. Finally, note that
K&R is *not* C as we know it today. The ANSI spec is a *lot* clearer on
the semantics of optimizations that can be performed.

C compilers do eliminate the for loop. Entirely in some cases. Consider
the following code:

int main( int argc, char **argv )
{
int i;
char array[4];

for(i=0; i<4; ++i)
array[i] = 0;

printf( "%d %d\n", array[0], array[3] );
return 0;

}


Here's the x86 assembly that MSVC emits for the body of the main
function:

; Line 9
xor eax, eax
mov DWORD PTR _array$[esp+4], eax
; Line 11
movsx ecx, BYTE PTR _array$[esp+7]
movsx edx, al
push ecx
push edx
push OFFSET FLAT:??_C@_06CMFP@?$CFd?5?$CFd?6?$AA@ ; `string'
call _printf

Whoa! Where did the for loop go? Well, the compiler was smart enough to
realize that writing a dword containing zero to the array variable did
all the work (much more efficiently) of the for loop. This type of
optimization is *hardly* particular to MSVC. Most decent compilers can
pull tricks like this.

In cases where tricks like this aren't possible, compilers can still
use loop induction variable optimization to completely change the type
of test they perform in the for loop. Consider the following C code:

#include <stdlib.h>

struct
{
char padding [8];
int anInt;
} aStruct[4];

int main( int argc, char **argv )
{
int i;

for(i=0; i<4; ++i)
aStruct[i].anInt = i;

printf( "%d %d\n", aStruct[0].anInt, aStruct[3].anInt );
return 0;

}

Here's the x86 code that MSVC emits:

; Line 13
xor ecx, ecx
mov eax, OFFSET FLAT:_aStruct+8
$L837:
; Line 14
mov DWORD PTR [eax], ecx
add eax, 12 ; 0000000cH
inc ecx
cmp eax, OFFSET FLAT:_aStruct+56
jl SHORT $L837
; Line 16
mov eax, DWORD PTR _aStruct+44
mov ecx, DWORD PTR _aStruct+8
push eax
push ecx
push OFFSET FLAT:??_C@_06CMFP@?$CFd?5?$CFd?6?$AA@ ; `string'
call _printf

Well, you can see that the for loop was emitted to this assembly
output. But note that nowhere is there a test for "i<4". The test for
the for loop checks to see if a pointer (held in EAX) exceeds an
array's bounds. Again, this is a *typical* induction variable
optimization that most modern compilers will do. Believe me, if the
specs *ever* prevented this type of optimization, the compiler writers
would have been all over the specification design (actually, they have
been, but for other issues).



>
> > If this is the case, then most optimizing C compilers are
> > severely broken.
>
> Getting in trouble from taking shortcuts is very common.
> The compiler may be standards compliant only at -O0 .

The compiler may not be compliant at all. MSVC, for example, is famous
for skirting the rules. In this particular case, however, MSVC (Gas,
Borland, and others) is well within the specs to skip with the loop
test if it is unnecessary.


>
> > It's precise in its unpreciseness, eh?
>
> Exactly!

Well, then you have a strange definition of preciseness.
Me thinks you've moved *way* to far to the defensive side here,
unwilling to admit that you overstated things a bit.

>
> > Quite the opposite. I'm advocating that programmers take
> > more responsibility for optimizing their own code rather than
> > relying upon their compilers to handle this stuff for them.
>
> Always good.
>
> > a programming language construct that is easier to read (but
>
> "Easier to read" is a value judgement. People have different
> values. Compilers have even more different ones.

The common consensus in most software engineering texts is that the
closer you move a computation to the point where the result of that
computation is used, the better. People have done a lot of research on
this, so I'm willing to take their word over a debate in an assembly
language programming newsgroup on USENET (the pinnacle of truth!).


>
> > Arguably, for example, leaving the call to strlen in the
> > for loop is a good idea (from a readability point of view)
> > because it keeps the calculation of the length as close to
> > the point of use as possible.
>
> But it means something different. What happens if this nice
> proggie is multithreaded, and other threads change strlen()?
> Then the `for` semantics is essential. [But watch for fireworks
> when strlen() get smaller].

Then the programmer will have to protect the execution of that loop.
Note that the process that interrupts the for loop could also be
interrupting the strlen calculation itself. If such multithreaded
applications are pouncing on each other's strings, they're going to
break. Moving the strlen calculation out of the loop may actually help
*improve* the situation (not fix it, but make the problem less likely
to occur) because you're only exposed during the strlen function once
rather than on each iteration of the loop. And strlen is *not* thread
safe (at least, the version that MSVC compiles into a scasb
instruction). Let's look at the x86 code that MSVC emits for the for
loop in question (gets and putch are added to prevent the compiler from
optmizing away the code):

#include <stdlib.h>


int main( int argc, char **argv )
{
int i;
char s[256];

gets( s );
for( i=0; i<strlen(s); ++i )
{
putch( s[i] );
}
return 0;

}


Here's the output x86 code:

_s$ = -256
_main PROC NEAR
; File t.c
; Line 5
sub esp, 256
; Line 9
lea eax, DWORD PTR _s$[esp+256]
push esi
push edi
push eax
call _gets
; Line 10
lea edi, DWORD PTR _s$[esp+268]
or ecx, -1
xor eax, eax
add esp, 4
xor esi, esi
repne scasb
not ecx
dec ecx
je SHORT $L845
$L835:
; Line 12
movsx ecx, BYTE PTR _s$[esp+esi+264]
push ecx
call _putch
add esp, 4
lea edi, DWORD PTR _s$[esp+264]
or ecx, -1
xor eax, eax
inc esi
repne scasb
not ecx
dec ecx
cmp esi, ecx
jb SHORT $L835
$L845:
pop edi
; Line 14
xor eax, eax
pop esi
; Line 16
add esp, 256
ret 0
_main ENDP

Because scasb is interruptible, this code is *not* thread safe. If the
compiler is going to emit thread unsafe code that computes the string
length, why would it fail to do other optimizations based on the belief
that such optmizations would fail in a multi-threaded environment?


>
> > Manually moving the calculation elsewhere in the program
> > and passing in the computed value adds a small level
> > of indirection to the whole process, making the program
> > slightly more difficult to read.
>
> Vanishlingly so if k = strlen(s) ; is immediate before the
> `for` loop. It makes it extremely clear what is going on.

Not vanishingly so. "for(i=0;i<strlen(s);++i)}" is a bit easier to read
and comprehend than

k=strlen( s );
for( i=0; i<k; ++i )

There's less cruft to deal with in the former example and it requires
less mental processing. There is also the case of someone having to
read the declaration for the k variable and figure out what it's used
for. This may seem trivial in an isolated example such as this one, but
when you have a large program full of constructs like this, it has a
big impact on readability.

>
> One of the problems with `c` is that people like to overload
> lines so that it begins to look like APL. I'm guilty too,
> but I know it's because of semicolonophobia.

Not sure I follow what you mean.
Yes, it takes *work* to write readable C code. By default, it's much
easier to write difficult to read code. That's why it helps if you can
count on the compiler to handle some obvious things (like moving a loop
invariant calculation out of a for loop) so you can keep the code as
readable as possible. This has nothing to do with laziness and
everything to do with thinking about the poor soul who has got to read
your code when you're done with it.


>
> > it's surprising that a compiler such as MSVC doesn't do this.
>
> Perhaps because strlen() is modifiable? It is not a reserved word.

This has nothing to do with moving the code that computes strlen (see
the previous example) out of the loop. Whether strlen is computed
in-line, a function is called, whatever, the calculation *can* be moved
out of the loop as the value of s does not change.

BTW, although strlen is not a keyword in C, it is a *special*
identifier that the compiler is allowed to recognize and treat
specially (as MSVC and other compilers certainly do).

>
> > Well, compiler vendors beg to disagree. In their opinions, people
> > use C/C++ because of its reputation as a fast language.
>
> Of course a portable assembler is going to be fast. And loose.

In 1971 the definition of C was loose. It's been tightened up *just* a
bit since then.

>
> > people have been told that optimizing compilers do such
> > a great job that human coders can't compete.
>
> I think you've address this well in "The Great Debate".
>
> In a knutshell, the complier helps the `c` newbie write much
> faster code than the ASM newbie (mostly buffered libcalls).
> However, the master `c` programmer is whipped by the ASM master.
> People stick with `c` for other reasons (maintainability,
portability,
> availability of `c` novices to sacrifice to it).

Beginning programmers are going to write lousy code; that's to be
expected. The problem with the "great debate" is that it *falsely*
gives HLL programmers a "security blanket" and lets them think that
they don't have to worry about optimization at all. All the time they
argue about how great their compiler is doing without really knowing
what's going on behind their backs.

A *good* C programmer, who also happens to know assembly language, and
write some *damn* good C code by hand optimizing their C statements to
produce good machine code output (e.g., as has been claimed for the
Linux Kernel). My complaint about this for loop example is that this
is an optimization one should *expect* a compiler to do (and some
compilers *will* do it; this isn't the most difficult concept in
compiler optimization theory). The point being is that a good C
programmer will find it difficult to find a common set of optimizations
they can count on, so they can move their C source files from compiler
to compiler without worrying about degenerate behavior. In Write Great
Code, Volume II, for example, I have to tell the programmers that they
need to sacrifice a little readability and manually watch for loop
invariant calculations because under some circumstances, compilers
won't do the job for them. Again, this for loop example is rather
trivial. It's easy enough, however, to create slightly more complex
examples where the readability takes a big hit because you've got to
move a loop invariant calculation of out a (deeply nested, for example)
loop.



>
> > never hear the argument "If you're using the *right* compiler
> > and you write your HLL code in a careful fashion then *much*
> > of the time the compiler produces code as good as a human
> > assembly programmer."
>
> And this is a good argument for understanding the quirks of
> compilers. The best way I know is to look at their ASM output.

And that's what a large part of my new book is all about.
Cheers,
Randy Hyde

.



Relevant Pages

  • Re: [EGN] Hoisting Loop Invariants (Was: Re: [EGN] Numerical Accuracy)
    ... compiler out there somewhere that did as you claim. ... > the programmer has this knowledge, then the programmer should not use ... >> string in a loop, regardless of the blatant inefficiency of doing so. ...
    (comp.programming)
  • Re: Brian Kernighan, maybe Im not worthy, maybe Im scum
    ... and just hone in on the stuff related to programming and this newsroup] ... moron that was taken from optimization which does hoist when to do so ... compiler design and optimization, including my 1976 text in graduate ... loop in a language in which the designers messed up, ...
    (comp.programming)
  • Re: Brian Kernighan, maybe Im not worthy, maybe Im scum
    ... what experienced programmers do, ... optimization, ... Thugs" ad nauseum fits that a lot more closely than discussing compiler ... be modified outside a loop, and guessing ...
    (comp.programming)
  • Re: Why is C# 450% slower than C++ on nested loops ??
    ... The posted benchmark was crucial to ... > compilers generate for the loop and get over with it. ... > additions in the outer loops, which the C# compiler doesn't. ... gotten around to implementing every possible optimization in every language, ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Order of function evaluation
    ... The optimizer in the compiler is allowed ... This is a typical optimization in any compiled, imperative language. ... >There were compilers that would evaluate sqrtoutside the loop, ... Of course, in any programming language, the programmer can hoist ...
    (comp.lang.c)