Re: A note on personal corruption as a result of using C
- From: "Clive D. W. Feather" <clive@xxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Mon, 18 Feb 2008 07:25:41 +0000
In article <87fxvr6m4p.fsf@xxxxxxxxxxxxxxx>, Keith Thompson
<kst-u@xxxxxxx> writes
spinoza1111 <spinoza1111@xxxxxxxxx> writes:
[...]
THE LIES OF C
"A string cannot contain Nuls" Yes it can. Data arrives as strings and
needs to be validated, but C makes it impossible. It becomes
impossible to write effective string validation routines by definition
if the input isn't even a string: garbage in, garbage out was never
truer.
The C standard provides its own definition of the term "string",
namely (C99 7.1.1p1):
I wouldn't bother. Nilges refuses to accept that there are two common
ways of representing strings:
(1) Count plus array of characters; this can include any character in
the string, but is limited by the maximum value of the count.
(2) Array of characters with a terminator; this has no limit on the
string length, but the terminator can't appear anywhere else in the
string.
C uses the second representation, some other languages the first. I
don't know why K&R made that choice, but it may have been to allow
easier pointer manipulation of strings. Nilges, however, believes it was
an anti-Vietnam protest.
Perhaps you'd like to have a way to represent a sequence of characters
that can include embedded null characters. C's standard "string"
doesn't give you this (for example, strlen() computes the length of a
string up to but not including the terminating null character), but
you can certainly create your own data structure that supports this.
For example, you could have a structure consisting of a length and an
array of characters.
Indeed.
One could certainly argue that C's choice of this particular
representation for strings was a bad idea.
I wouldn't go that strong. It has its advantages and one small
disadvantage. I can't remember being harmed by that disadvantage, but,
there again, I'm used to coding around it.
Or perhaps you object to C
choosing to apply a fairly generic term "string" to one specific
representation.
There again, I could object to BCPL using the term "string" to mean
something that can't have a length greater than 255 characters. So long
as the context is clear, I don't think it actually hurts for C to do
this.
"A string is 8 bytes" No it isn't and this has not been the case since
under Deng Xiao Peng China entered the real world.
It took me a while to figure out what you meant by this. Of course a
string isn't 8 bytes, unless it just happens to have that length. I
think what you meant is that C claims that a string is composed of
8-bit characters.
If this is what he means, it's a quick reversal. Only a few weeks ago he
seemed to be objecting to the idea that a byte might be more than 8
bits. To quote:
| -Nilges-> To say that C can with anything but clumsiness handle wider
| (or narrower) characters is false, since it is so falsely "powerful"
| that to change a C program to handle other than 8 bits implies a line
| by line audit. Herb is pragmatically correct.
I may also be mixing this up with something else, but I believe he also
objected to ASCII being described as 7 bits.
But yes, in practice, CHAR_BIT==8 on all real-world hosted
implementations that I'm aware of.
In particular, this is a requirement of POSIX (something they added
after I pointed out the implications of not having it).
C does provide a wchar_t type
capable (at least potentially) of representing larger character sets,
though support for it is somewhat sketchy.
Um, what is missing? Noting that a *lot* got added in C94.
"Aliasing gives me power" No it doesn't. It means that you have been
too lazy to find an effective algorithm.
I don't know what you mean by that. Is it something to do with what's
sometimes called "type punning"? Perhaps an example would help.
I suspect it's a reference to something I wrote in 1994:
========
However, what is important is why
only some lvalues can refer to a given object, and the annotations
completely skip this. The reason is, of course, to indicate when a
compiler can assume that two identifiers refer to the same object.
For example, in:
char *cp;
int *ip;
void f (double *d)
{
*d = 3.14159;
*cp = 1;
*ip = 2;
}
The rules of this section say that the assignment to *cp could
potentially alter *d, and the compiler must generate code that takes
that into account, but the assignment to *ip cannot, and the compiler
may assume that *d and *ip do not overlap. This is called aliasing,
and knowing when aliasing takes effect is an important factor in
correctly optimising code.
========
and his recent response:
| -Nilges-> The only way of "correctly optimizing code" is not to use
| aliasing so pathologically but to intelligently use an optimizing
| compiler. The compiler determines whether the lValues can refer to the
| same object. The programmer should avoid using global variables as
| much as possible; this is the real lesson of the above crap code,
| along with the need to organize things into structs when they are
| global.
"Don't make me think! Just make the behavior undefined in the
standard!" Up yours, pal.
The standard does not define the behavior of all possible constructs.
He doesn't seem to like this concept. Indeed, at times, he believes that
the *correct* meaning of any C construct is what his MS-DOS compiler
produced.
Perhaps you'd like to design a systems programming language that
doesn't have this problem. Perhaps you could make technical points
without resorting to abusive language.
I wouldn't hold your breath.
"A regular expression is what my code can handle" No, it isn't: the
theory was developed before computers.
I don't understand whatever point you're making here. If you're
refuting a claim that someone has made, I don't recall seeing it.
I *think* this is something to do with a regular expression parser in a
book called "Beautiful Code" which doesn't handle Unicode or certain
badly-formed regexes. But I must admit to only skimming those threads.
"A struct is a class" No, it isn't.No such claim has been made with regard to C, because C doesn't have
classes.
Indeed.
"A for is just while with sugar": no it isn't. The for loop needs to
evaluate invariants before it starts, but if you send a boy (Ritchie)
to do a man's job, you get a useless for which might as well be a
while.
Surely you could have made whatever point you're making without
personal insults.
A C for loop is just a while with sugar, as you acknowledge in the
same paragraph ("migh as well be a while"). Perhaps what you mean is
that a C for loop isn't defined the way you want it to be.
Once upon a time, Edward wrote a piece of code that went something like
this (I haven't memorised it, so exact details might be wrong):
for (intIndex = 0; intIndex < strlen (strString); intIndex++)
strCopy [intIndex] = strString [intIndex];
Don't worry too much about the detail; the key points are the use of
strlen in the loop test and the fact that the string is not modified by
the loop body.
Somebody pointed out that this was inefficient because strlen() gets
called every time round the loop and it would be better to compute it
once and store it in a variable. For some reason he seems to think that
this was a personal attack and, rather than accepting he misremembered
the semantics of loops in C, claims that the condition *should* be
evaluated only once. Or something like that - it's not always easy to
understand our Edward.
Much later I *very* briefly mentioned the idea of a "parallelising C"
which might write:
for (i in eachof [0, strlen (s) - 1])
d [i] = s [i];
or something like that. Clearly it's a conspiracy that nobody told me
off for putting a function call in the loop condition.
"Here's the preprocessor. Don't use it.": a Biblical injunction:
here's the apple tree, guys. Don't eat the apple and don't drink the
Kool Ade: but you must and will, and I'm God.
Who exactly is claiming to be "God"?
The C preprocessor is a powerful tool, and it's extremely easy to
abose and/or misuse it.
I *think* the history of this is something like:
Nilges: memcpy is dangerous.
Others: not if used properly.
Nilges: yes it is, because a programmer can write
#define memcpy something_else
and you can't spot the breakage without auditing every single line of
the code. Other languages don't have this kind of global scope.
Others: nobody sane would do that in the way you've written.
As shown elsewhere (look for TICKS_PER_SEC (sic) if you want examples)
the difference between #define in standard headers and in user code. And
he doesn't accept the idea of sharp tools that can be dangerous if
misused.
He also hasn't spotted that his beloved Algol has features which can do
far more damage than the C pre-processor; I can write a line at the
*end* of an Algol program that completely changes the meaning of the
rest of the code and would require a line-by-line audit to spot.
"Pointers are unsigned integers and as such compareable. No they
aren't. Yes. Just kidding, they aren't.": C "experts" often sound like
the villain in the movie Dodgeball, "White" Goodman.
No, pointers aren't unsigned integers. They can be compared *as
pointers*.
I haven't the slightest idea what he's complaining about here.
"Here's post and pre increment. Don't try to guess when they happen":
there weren't enough lies in 1999, so yippee let's standardise the
language and add more!
The C standard does not define the behavior of certain expressions
such as ``i=i++;''. It states this clearly and explicitly. You can
argue that this is poor language design, but how is this a "lie"?
Perhaps you're using the word "lie" in some non-traditional way.
You need the full backstory here.
Once upon a time there was someone called Herb Schildt, who Edward seems
to think is a demigod or at the least a beatified martyr whose
reputation needs to be defended. Herb wrote a book called "The Annotated
C Standard". In 1994 I wrote a review of this book called "The Annotated
Annotated C Standard"; you can read it at:
<http://www.davros.org/c/schildt.html>
In it, I wrote:
========
## The standard states that when an expression is evaluated, each
## object's value is modified only once. In theory, this
## means the compiler will not physically change the value of a
## variable in memory until the entire expression has been
## evaluated. In practice, however, you may not want to rely
## on this.
The book then in effect goes on to say that "i = ++i + 1" is usually
compiled as if it were "i += 2".
========
where the first paragraph is a direct quote from the book.
Edward's response to this was:
| -Nilges-> The pathological code DOES have a defined value for each
| compiler. Herb DOES advise against the bad practise. Herb is right and
| you are wrong. Everything's "defined". It's not black magic,
| howevermuch that would suit those who don't understand their business.
|
| -Nilges-> Since it's malpractice to write new code in C, the actual
| job of many C programmers is to maintain old code, most of which is
| nonconformant. The code has a defined result every time it runs, and C
| programmers need to know the range of what could happen. Your time
| would have been better spent not "standardizing" C but treating it
| more as a linguist treats natural language.
|
| -Nilges-> You would have served genuine needs had you gone out in the
| field and described what major compilers DO. This is what Herb is
| doing.
As shown in subsequent discussions, he appears to consider sequence
points the work of the industrial-military complex and responsible for
the September 11th attacks and the loss of the orbiter Challenger.
Indeed, what drew me to this group was Nilges (still unfinished) attempt
to deconstruct my review as a content-free personal attack on Schildt.
Now, I'll be the first to admit that sequence points aren't the
prettiest feature of C, but they're an attempt to codify a range of
behaviours of existing pre-standard compilers. As such, we have to live
with them. Pretending that code like "i = i++" is well-defined helps
nobody.
--
Clive D.W. Feather | Home: <clive@xxxxxxxxxx>
Tel: +44 20 8495 6138 (work) | Web: <http://www.davros.org>
Fax: +44 870 051 9937 | Work: <clive@xxxxxxxxx>
Please reply to the Reply-To address, which is: <clive@xxxxxxxxxx>
.
- Follow-Ups:
- Re: A note on personal corruption as a result of using C
- From: spinoza1111
- Re: A note on personal corruption as a result of using C
- References:
- A note on personal corruption as a result of using C
- From: spinoza1111
- Re: A note on personal corruption as a result of using C
- From: Keith Thompson
- A note on personal corruption as a result of using C
- Prev by Date: Re: help me become a better programmer
- Next by Date: Re: A note on personal corruption as a result of using C
- Previous by thread: Re: A note on personal corruption as a result of using C
- Next by thread: Re: A note on personal corruption as a result of using C
- Index(es):
Relevant Pages
|