Re: The annotated annotated annotated C standard part 4
- From: "Clive D. W. Feather" <clive@xxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Fri, 25 Jan 2008 19:49:33 +0000
[Allegedly we aren't going to get a part 5, but I'm still going to
follow up to this.]
In article
<33a2ad6d-82ac-4c23-bf1c-9a7270870c36@xxxxxxxxxxxxxxxxxxxxxxxxxxx>,
spinoza1111 <spinoza1111@xxxxxxxxx> writes
-Schildt-> In other words, the executable version of a C program
contains
## a table that contains the string literals used by the program.
While this is one way to implement strings, it is not the only one.
Such a comment does not belong in a book like this.
-Nilges-> Do tell us...what might those be? Unless string literals
could be compiled into instructions, which would require instructions
to be variable-length, then some sort of "table" or data structure is
required. Schildt as a mentor is merely trying to help the student
visualize the process, and this is absolutely essential to
understanding.
The strings could be scattered through the code at convenient points. It
doesn't require variable-length instructions. For example, the resulting
assembler could be:
call load_r1_with_string
db "String"
db 0
// Next instruction
The function load_r1_with_string pops its return address into a
register, increments it to walk along the string until it finds a zero
byte, then pushes it back on to the stack for the normal return
mechanism to find.
I used this approach in a command interpreter many years ago. We were
very tight for EPROM space (some idiot hardware designer had tied A13 on
the socket to Vcc instead of connecting it to A13 on the bus, so we
could only use 8kB EPROMs instead of 16kB, and had to get an entire OS
and CLI into there), so the saving this gave you - there were lots of
strings to be printed - was worthwhile.
In fact, we went one step further: the Z80 had 8 addresses which could
be called with a one byte opcode rather than needing three (opcode plus
address). So I used one of them as a what I called "supercall" - this
opcode followed by a single byte code number caused a jump or call to an
address taken from a look-up table. If an address was called N times in
the code, this took 2N+2 bytes rather than 3N, so any address used three
times or more meant a saving.
[End digression.]
Okay, it's not the strongest complaint about the book, but I still think
it's valid.
-Schildt-> Further, the effect of changing the string literal table
is
## implementation dependent. The best practice is to avoid
## altering the string table.
It's more than just implementation dependent (a term which, by the
way, is not used by the standard), it's completely undefined. You
must not modify a string literal.
-Nilges-> A less brutal phrasing of the same fact, where "must not" is
used to talk to children and not working adults, is counted as an
error in the ongoing campaign of personal destruction.
"Best practice is to avoid" implies that you can do it, it's just not
pretty. The standard says that if you do it, all bets are off. That's a
"must not" in my opinion. I feel that the book's wording is far too
weak.
By the way, note that "abcde" and "cde" don't have to be separate
strings in the program, however they're done. Changing the latter to
"cdf" could cause the former to change to "abcdf". Another reason that
it's undefined behaviour, not just bad practice.
6.2.1.2
A description which is essentially correct is spoilt by the addition
of the words:
-Schildt-> In the most general terms, when you convert from a larger
## integer type to a smaller type, high-order bytes are lost.
When an integer value is converted to a signed type which can't hold
that value, the result need not be that given by removing some bits.
For example, a rule that converted all such values to the minimum
value of the destination type (SCHAR_MIN, SHORT_MIN, INT_MIN) would be
conforming.
A simpler way to state what this section means is:
If the source value can be represented in the destination type, it is
unaltered.
Otherwise, if the destination type is unsigned, reduce the value
modulo U<type>_MAX+1.
Otherwise the destination type is signed and the value is
implementation defined.
-Nilges-> What part of "in the most general terms" don't you
understand? Schildt's definition is by far better than yours.
No it isn't.
When you convert from a larger integer type to a smaller *unsigned*
integer type, then the high-order bytes are lost. Well, if the value is
positive or the representation is twos-complement - if it's
ones-complement or sign-and-magnitude, the implementation has to do more
work.
[Just to make this completely clear: in the three systems and in 32
bits, -4 is represented as 0xFFFFFFFC, 0xFFFFFFFB, and 0x80000004
respectively. When converted to a 16 bit unsigned type, the result must
be 0xFFFC (with 8 bits, it must be 0xFC) *irrespective* of the
representation.]
When you convert from a larger integer type to a smaller *signed* type
and the result won't fit, ANY VALUE CAN BE GENERATED (provided that the
compiler documentation tells you what it is). It is perfectly legitimate
for an implementation to "clamp" the result to the largest positive or
negative value. Thus converting 0x76543210 to signed 8 bits might
produce the value 0x7F rather than Schildt's 0x10.
Again, the statement in the book is correct for MS-DOS and other
compilers for specific architectures, but it's not what the Standard
says.
-Nilges-> Many programmers in the real world will not understand an
unexplained use of Modulo while they would understand "division
remainder". It is contemptuous of them for you to propose this as a
rewrite.
"Modulo" is the term in the Standard. But I wouldn't care if a rewrite
said "division remainder" instead. The important point is that the
"loses the high-order bytes" bit is completely wrong.
-Nilges-> And given the overblown claims for C standards, we
nonetheless discover that "the value is implementation defined". Oh
well. It sounds to me that you damaged C.
Only because you can't accept the concept of different implementations
doing different things. "implementation-defined" means "the compiler
writer can do whatever seems sensible on her system, provided that she
puts it in the manual".
6.2.1.4
-Schildt-> When converting a larger [floating] type into a smaller
one, if
## the value cannot be represented, information content may be lost.
Actually, unlike integers, such conversions are undefined, and the
program may crash as a result.
-Nilges-> Only two real errors so far. A great deal of ink wasted
UNLESS the goal is death to Herb.
A lot more than two, actually.
6.2.1.5
-Schildt-> these automatic conversions are also intuitive.
These conversions have been the subject of much debate. This section
would benefit from a proper explanation of the "value preserving"
rules, and why they were chosen.
-Nilges-> It was your job, not Herb's, to explain them in full detail.
Wrong. It is the job of the Standard to state the rules unambiguously.
It does.
It is not the job of the Standard (though it is a worthy aim when that
doesn't conflict with other goals) to be easy to understand.
In this case, Herb wrote "intuitive" for something that very clearly
isn't so, because well-intentioned people suggested very different
approaches to this section. I don't, after all this time, remember
exactly what else he wrote there but, based on my comment, I don't think
he explained the conversions properly.
Herb's job was to assure his readers, insofar as he could, that you
hadn't messed things up completely. It appears to have been a meta-
Augean task, because you may have.
His job isn't to assure his readers anything. His job - based on the
title and structure of the book he wrote - is to annotate the C Standard
so someone unfamiliar with it can better understand it. In this case, in
my opinion, he failed.
6.2.2.1
-Schildt-> First, an array name without an index is a pointer to the
first
## element of the array and is not an lvalue.
This has to be one of the worst expressions of the Rule I have ever
seen !
At the time I wrote this, "the Rule" was a term that every comp.std.c
reader would be familiar with. "The Rule" says that, in *most* contexts,
the name of an array is converted to a pointer to its first element.
Many people dislike this feature of C, but it's been there since day 1
of the language.
First, there are a number of contexts (such as sizeof) where
an array name does not get changed to a pointer. Second, if the decay
to a pointer takes place at all, it takes place whether or not there
is an index; for example, decay takes place when the array name is
used as a function argument. Last, an array name is an lvalue; it is
the resulting pointer that is not.
-Nilges-> Schildt is not trying to express a mess. He's trying to
explain it, which is something different.
If so, he failed. He wrote "an array name ... is not an lvalue" when it
very definitely is. He failed to explain that the name is *converted* to
a pointer. He introduced the "without an index" nonsense. And he failed
to mention all the places where an array name, with or without an index,
is *NOT* a pointer to the first element of the array.
YOUR descriptions here and
elsewhere are useless. In all too many cases they describe things as
idiomatic and as mysteries which it is no business of the programmer
to know.
I was explaining why the annotation was wrong. I wasn't trying to write
his book for him.
6.2.2.3
Considering how often they are used, the rather peculiar way they are
specified, and the need to cast them in some contexts but not others,
it is odd that null pointer constants are not mentioned at all.
-Nilges-> Perhaps. But again a matter of selectivity and style is
counted as an error.
No, it's counted as a matter of selectivity where I think he made the
wrong choice. The Standard's approach to null pointer constants is - in
my opinion - a complete mess. Unfortunately, it's a mess that was
imposed on the authors by the existing compiler writers.
6.3
-Schildt-> The standard states that when an expression is evaluated,
each
## object's value is modified only once. In theory, this
## means the compiler will not physically change the value of a
## variable in memory until the entire expression has been
## evaluated. In practice, however, you may not want to rely
## on this.
The book then in effect goes on to say that "i = ++i + 1" is usually
compiled as if it were "i += 2".
As anyone who has survived the "i = i++" thread on comp.lang.c knows,
this is not only nonsense, but dangerous nonsense. The correct way to
discuss this part of the standard is to point out what can and can't
be done in a strictly conforming program, and leave it at that.
Suggesting that such code can ever have a defined answer is asking for
trouble.
-Nilges-> The pathological code DOES have a defined value for each
compiler.
We've been through this. It does not. End of story.
Herb DOES advise against the bad practise.
With weasel-words. He suggests that the Standard defines it, but some
compilers get it wrong - that's certainly how I would read "in practice,
however, you many not want to rely on this". It implies that I should be
able to rely on it, were it not for unfortunate circumstances.
Herb is right and
you are wrong.
No, he's completely wrong. The Standard allows implementers as much
freedom as they want here. The same code can produce different results
in different places. It can produce different results in the *same*
place (due to instruction cache and bus timing issues). NO COMPETENT C
PROGRAMMER SHOULD EVER WRITE SUCH A LINE OF CODE.
Everything's "defined". It's not black magic,
howevermuch that would suit those who don't understand their business.
No, it's not black magic. However:
"We demand rigidly defined areas of uncertainty and doubt."
Mr.Adams had it right.
-Nilges-> Since it's malpractice to write new code in C,
We're back on the strange ideas again.
the actual
job of many C programmers is to maintain old code, most of which is
nonconformant. The code has a defined result every time it runs, and C
programmers need to know the range of what could happen. Your time
would have been better spent not "standardizing" C but treating it
more as a linguist treats natural language.
This completely fails to understand what a Standard is.
-Nilges-> You would have served genuine needs had you gone out in the
field and described what major compilers DO. This is what Herb is
doing.
That may well be what he's doing (or, more precisely, *one* common
compiler), but *he didn't say so*. He claimed to be describing, or
annotating, the Standard. And that is very different.
-Schildt-> The rest of this section formally defined what type of
lvalue
## can refer to an object.
Well, in one sense this is true. However, what is important is why
only some lvalues can refer to a given object, and the annotations
completely skip this. The reason is, of course, to indicate when a
compiler can assume that two identifiers refer to the same object.
For example, in:
char *cp;
int *ip;
void f (double *d)
{
*d = 3.14159;
*cp = 1;
*ip = 2;
}
The rules of this section say that the assignment to *cp could
potentially alter *d, and the compiler must generate code that takes
that into account, but the assignment to *ip cannot, and the compiler
may assume that *d and *ip do not overlap. This is called aliasing,
and knowing when aliasing takes effect is an important factor in
correctly optimising code.
-Nilges-> The only way of "correctly optimizing code" is not to use
aliasing so pathologically but to intelligently use an optimizing
compiler. The compiler determines whether the lValues can refer to the
same object.
It can't always do so. That's why the rules are there.
The programmer should avoid using global variables as
much as possible;
I think that's too strong a statement, though I see where you're coming
from.
this is the real lesson of the above crap code,
along with the need to organize things into structs when they are
global.
This is nothing to do with structs.
-Nilges-> Intelligent programmers minimize global variables existing
in the implicit namespace. One way to do this is to always put things
in structs, because this lowers the probability of accidents based on
aliasing. The structure members can't be referred to "by accident",
only the structure instance name.
Putting everything in structs wouldn't have changed the issue with that
code, and that section of the Standard, one iota.
[Interesting theories on language design, plus infantile insults,
removed because it's not relevant. If you want to start a language
design thread without insults, I may well join in.]
6.3.2.2
-Schildt-> When no prototype for a function exists, it is not an
error if
## the types and/or number of parameters and arguments differ.
## The reason for this seemingly strange rule is to provide
## compatibility with older C programs in which prototypes do not
## exist.
On the contrary, when no prototype exists, the number of arguments to
a call must be the same as the number of parameters in the function
(which cannot be a varargs function), and the types must be compatible
after promotion. What should have been written is that no error
message is required if these rules are broken.
-Nilges-> What's "broken" is a standard that says "no error message is
required if these rules are broken". What's "broken" is a standards
committee which comes up with that crap.
I can't tell you why it was decided not to require an error message.
Looking back over many years, I agree it seems strange. But I'm sure
there was a good reason.
However, that doesn't alter the fact that Schildt's annotation is just
plain wrong.
6.3.2.3
Though this section mentions the existence of the "common initial
subsequence" rule for unions, it does not explain it properly, nor
does it mention that in all other circumstances assigning to one
element of a union makes all other elements have undefined values.
-Nilges-> More amateur literary criticism added to pad the rap ***.
In your opinion. Not that there ever was a "rap ***".
6.3.6
There is no mention of the rule that addition and subtraction of
pointers and integers must yield a pointer to the same array or one
past the end of the array.
-Nilges-> Probably because this rule is not enforced in C, which has
no bounds checking whatsoever and as such is an infantile disorder,
that should not be standardized, because that gives a false illusion
as to its usability.
I've used bounds-checking C systems, so don't tell me that the rule
isn't enforced.
No, the Standard doesn't require the run-time system to do lots of
checks. Given the context in which C was created, and the reasons it was
used, that's a sensible decision. Equally, it doesn't forbid run-time
checks either; that's the whole point of the wording in 6.3.6.
6.3.7
-Schildt-> When right-shifting a negative value, generally, ones are
## shifted in (thus preserving the sign bit), but this is
## implementation dependent.
The result of signed right shift of a negative number is
implementation defined; there is no suggestion in the standard that
shifting in ones is the "best" thing to do.
-Nilges->Herb doesn't suggest that this is "best", although it clearly
is.
What is "generally" mean to suggest, then, if not the preferred choice?
Do tell us what should be shifted in.
Whatever the implementer things is the best thing to do. This might be
shifting in ones, it might be shifting in 0s. It might be dividing by 2
to the Nth power, which is neither. It all depends on what instructions
he has available.
If negative numbers are
zeroes-complement, then zeroes should be shifted in. Get a life.
Zeroes-complement?
6.3.13
There is no mention of the fact that && and || evaluate explicitly
left to right, and stop when the result is known. This would be an
opportunity to discuss sequence points, but the opportunity is
missed.
-Nilges->A definite omission on Herb's part, although "sequence
points" aren't computer science. Not meriting the attack on his
reputation.
No attack. This would have been a good opportunity to talk about an
important subject.
6.3.16.2
When talking about compound assignments (+= etc.), the annotations
mention that "a += b" means the same as "a = a + b", but do not point
out that the two are not equivalent; for example, "*a++ *= 2" is
strictly conforming code which increments a once, while "*a++ = *a++ *
2" is not.
-Nilges->I really hope you never teach programming.
Too late.
And the point remains: "a += b" is equivalent to "a = a + b" sometimes
but not always. "*f() += b" will call f once; "*f() = *f() + b" will
call f twice.
6.3.17[...]
Again, there is no mention of sequence points.
-Nilges->"Sequence points" are an intellectual hack
We've heard this rant before, and I see no point in going through it
again.
-Nilges->I'm aware that some clown put an article about "sequence
points" in wikipedia. Oops, was that you, Clive?
No. I hadn't even read it until a moment ago.
It contains at least one error. In the expression:
f (i++) + g (j++) + h (k++)
it is *not* true that "The values of j and k in the body of f are
therefore undefined". The values are defined; each increment must happen
either before or after the function calls (and, of course, the increment
in each argument must occur before the corresponding function is
called).
So if the implementation makes the calls in the order g, h, f, then:
j is incremented before g is called
k may be incremented before g is called OR
between the calls to g and h
i may be incremented before g is called OR
between the calls to g and h OR
between the calls to h and f
Furthermore, if g can alter the value of k, the value passed to h may be
the value of k before the call to g or after it (and in the latter case
the increment must also be after the call to g). If it's the former but
the incremented value is stored after the call to g, it's *not* the
value put there by g that is incremented.
Yes, this is complicated. Which is why such edge cases should be
avoided. Not that you'd know it from Schildt's book.
6.5
-Schildt-> In simple language, a declarator is the name of the object
being
## declared.
In real C, a declarator is everything about the type and name of the
object except the basic type and storage class. For example, in
"static int *p[5];", the declarator is "*p[5]", and includes the
concepts of pointer, array, and size of array as well as the name.
-Nilges-> What part of "in simple language" don't you understand?
The bit that doesn't correspond to the truth.
In "static int *p[5]", Schildt says that the declarator is "p". It
isn't.
6.5.1
-Schildt-> A variable declared using extern is not a definition.
Not only is this wrong, but the annotations to 6.7.2 directly
contradict it, with the correct example of "extern int count = 10;".
-Nilges-> In this case, had Herb said it WAS a definition, you would
have come up with a counter-example where it isn't,
Yes.
since you are
intellectually dishonest and your goal was to destroy Schildt.
No, because neither "is" nor "is not" is the absolute truth. A variable
declared using extern *might* be a definition and *might not* be a
definition. It's as simple as that.
But Schildt is the one who made the claim that he contradicts later, not
me.
-Schildt-> In essence, a static local variable is a global variable
with
## its scope restricted to a single function.
Actually, a static local variable is a global variable with its scope
restricted to some block scope; that is, from the end of its
declarator to the closing } of the block it is declared in.
-Nilges-> It's quite possible that Schildt didn't understand that a
variable has block scope. Based on my experience with John Nash's
problem with Turbo C in 1992, the Microsoft compilers Schildt used may
have given variables function scope.
Oh?
Open question: has *anyone* ever come across a C compiler that does
this?
In the IBM tradition,and its
successor tradition, programming languages tended to allocate
variables at the start of function and not support block scope.
C doesn't come from "the IBM tradition", it comes from "the Algol
tradition". With blocks. And block scope.
-Nilges->If Schildt didn't understand this issue, this would be his
worst error and since it errors on the side of caution, and since a
runtime that allocates at function header is logically consistent with
a runtime that allocates at block header for non-pathological code,
This isn't a runtime issue, it's a compile-time issue. The object in
question isn't allocated at function startup (I presume you mean), but
at program startup.
Giving the variable function scope would be different to giving it block
scope. Plenty of non-pathological code could tell the difference (using
the same variable name in nested blocks is *not* pathological, though it
can confuse if done badly).
the error merited a POLITE correction, not a hatchet job. McGraw Hill
might have treated your "errata" as something less than a geek tantrum
had you restricted your focus to this issue.
This wasn't "a hatchet job".
-Schildt-> When static is applied to a global variable or function,
it
## causes that variable or function to have file scope
The global variable or function has file scope whether or not static
is applied to it. The static keyword causes it to have internal
linkage, which is a different matter.
-Nilges-> The teacher has to convey the important consequences of an
action. Schildt is obviously saying p implies q, not p is equivalent.
He's not obviously saying anything. If the variable or function has file
scope, it has file scope with or without static. If the variable does
not have file scope, it does not have file scope even if you add static.
Schildt's statement is just plain wrong.
-Schildt-> The register specifier is only a request to the compiler,
which
## may be completely ignored.
It can't be completely ignored, because whether or not it affects the
way in which the variable is implemented, it is still illegal to take
the address of an object declared register.
-Nilges-> Here, as in the case of i=i++, good programmers have clean
minds and as such certain practises don't occur to them, such as
taking the address of a variable declared register.
*WHY NOT?*
On some computers the real CPU registers have addresses, so why can't
you take the address of a register variable?
Answer: because the register specifier *does* mean something in the
language: it means "address can't be taken".
Yes, I know "register" is an odd word to use for that, and it's a
historical mess, but that's what good annotations are for.
[...]
-Nilges-> Whereas it appears to me at this point that geeks on
standards committees,
Like Herb Schildt? After all, he publicises the fact.
who are on record as claiming that they
standardized C Sharp without programming in it,
The same old lie.
programming becomes a
sniggering matter of counterexamples and intellectual pornography.
Herb could have written "The register specifier is only a request to
the compiler, which may be completely ignored, and don't try to take
the address of a register variable, because historically these were in
non-addressible 'general registers'"... and not have added anything of
real use to the statement.
Better phrasing, off the top of my head, would be:
The register specifier is essentially a request to the compiler to
treat this variable specially because it will be used a lot. The
compiler is free to ignore this request. Whether or not it does so,
you cannot take the address of a register variable because, on some
computers, registers don't have addresses.
-Nilges-> You clearly pre-decided, perhaps based on his personal
style, that you hated his guts, and as a result this is a kangaroo
court, in which anything he does or does not say is used against him.
Yawn.
Did he come into the standards committee once, pick up some hot girl
who wouldn't give you the time of day, and leave?
<FX: falls about laughing hysterically>.
6.5.2.1
There is no mention of the implementation-defined aspects of bit
fields.
-Nilges-> Boo hoo. What implementation shall we use? Your pet chipset?
That's the point. There are issues. Why not describe them?
-Schildt-> This padding must occur at the end, not at the beginning,
of the
## object.
Padding can occur anywhere except at the beginning of a structure. In
particular, it can occur between two fields. Of course a union can
only be padded at the end.
-Nilges-> You're playing with words. It so happens that padding inside
a structure is padding at the end of its members. Herb is right,
recursively, and you are wrong, recursively.
He clearly meant at the end of the structure; you're struggling.
--
Clive D.W. Feather | Home: <clive@xxxxxxxxxx>
Tel: +44 20 8495 6138 (work) | Web: <http://www.davros.org>
Fax: +44 870 051 9937 | Work: <clive@xxxxxxxxx>
Please reply to the Reply-To address, which is: <clive@xxxxxxxxxx>
.
- Follow-Ups:
- Re: The annotated annotated annotated C standard part 4
- From: Ben Pfaff
- Re: The annotated annotated annotated C standard part 4
- References:
- The annotated annotated annotated C standard part 4
- From: spinoza1111
- The annotated annotated annotated C standard part 4
- Prev by Date: Re: Find longest/shortest sub sum <= max
- Next by Date: Re: The annotated annotated annotated C standard part 4
- Previous by thread: Re: The annotated annotated annotated C standard part 4
- Next by thread: Re: The annotated annotated annotated C standard part 4
- Index(es):
Loading