Re: A note on computing thugs and coding bums
- From: spinoza1111 <spinoza1111@xxxxxxxxx>
- Date: Fri, 11 Jan 2008 06:49:19 -0800 (PST)
Ben's posts are examples of what I'm talking about! He's able to find
flaws in the code and address my points without Schildt-like attacks
on people.
Randy Howard, Richard Heathfield, and Willem, either emulate him or
leave.
Here's my response including a bug fix.
On Jan 11, 1:08 pm, Ben Bacarisse <ben.use...@xxxxxxxxx> wrote:
spinoza1111<spinoza1...@xxxxxxxxx> writes:
<snip>
This was a technical post describing my concern, with references, to
code presented as Beautiful which seemed to me Ugly because it didn't
even work for modern strings [...]
Fairer to say that it is "old" code. Your C# "modern string" version
would not have been possible at the time. Technology marches on and
The problem here is that the publication date of the book Beautiful
Code was 2007. It is intellectually dishonest to maintain that
something as pragmatic, as applied, as code can be beautiful, at least
without an explicit statement from Kernighan that "I ignore
international characters", a statement he did not make.
we stand, to some extent, on the shoulders of giants. Incidentally,
(talking of giants) the UTF-8 encoding of Unicode that has so helped
to make "modern strings" possible was designed and first implemented
by Ken Thompson -- prompted by Rob Pike.
Which makes Kernighan's omission even more pernicious.
American computer scientists and their fellow-travelers all over the
world insist on using C to talk about algorithms, and this is a
serious mistake, as exhibited by the ease in which I transliterated
the code and created thereby an out of bounds bug (immediately caught
by the runtime, as it would be in production) for the regex A*.
Pike's code is bug-free, it appears (although I have no proof of this,
and haven't seen it run bug-free according to its own charter). This
is not because he's a demigod. My own experience between 1970 and
1975, a period in which I used assembler exclusively, was that by dint
of constant work with a language you avoid making stupid mistakes. I
had this same experience when I used C heavily between 1987 and 1992,
yet I blundered when returning to it...for example, I calculated with
invariants in a for loop header.
But all this means is that C unnaturally imposes tics and expectations
on the programmer coming out of a specific place and time.
It is bad teaching practice to teach, as good computer science, in a
notation that requires the student to know and remember (1) value
parameters can be written safely in C and (2) no sweat if you're off
by one, Nul saves you.
In particular, the latter "feature" creates a coding bum who simply
doesn't worry about such errors.
When I saw the code passing an incremented but unchecked regex, I was
astonished at Pike's and Kernighan's silence on what essentially was
the presentation of pathology as Beauty, for the same reason I was
astonished by the pathology and silence of using a value parm as a
work area. More than this, the arrogance of the pathology, the
arrogance of the brutal silence, and the arrogance of the praxis made
me physically ill.
Now, I am aware that Princeton University, where Kernighan teaches, C
is a required language for majors in computer science. In fact, it is
not taught for the same reason Princeton's math classes start with
calculus. I know this because part of my job at Princeton between 1987
and 1992 was teaching C, during a period in which I used it and
assisted Nash, and became aware of its unusability (Nash switched to
Mathematica).
But during that time, I found myself needing to create a systematic
set of functions associated with structs merely to convert their
contents to strings, to compare them, and to duplicate them. These
were of course precursors of modern, object-oriented, methods such as
toString(), compare() and clone().
It seemed, to use a Shakespearean turn of phrase, a waste of spirit in
an expense of shame to create separate but equal functions, each of
which posed a danger in turn. I could see, easily enough, that the
process had a finite bound. But compared to what I knew of
mathematical work, it seemed utterly pointless.
My mates also seemed strangely infantilized by C. If passed a string
not terminated by Nul, it was always "the user's fault". Some of them
seemed to manifest control-freak psychoses because their little plans,
for example to preallocate arrays with secret bounds, were always
screwing up their lives when in actual production, the need was for a
larger array allocated using malloc. It always seemed for them "too
much work" to at all religiously allocate variable-size objects not on
the stack with some sort of check on the return value to make sure it
was not null.
Whereas today I can try and I can catch.
Which means that in the Princeton CS program, the students have had to
acquire as a condition for success a set of tics that they will, as
undergraduates and beyond, confuse with knowledge, making them less
than they could be.
They will believe false propositions such as "runtime bounds checks
are unmanly and inefficient, for sissies and script kiddies and
girls", "don't worry about off by one", and "all my callers will send
me strings using English-language characters properly bound by Nul",
and "if they don't, call Homeland Security".
An American-centric and political narrowness and xenophobia is the
direct result.
They will use a language for thinking about algorithms ridden with
clap trap and later on, as distinguished researchers, they will close
their ears to anything else.
and had numerous other technical flaws.
Allegedly. In another post I tried to cut though the verbiage to find
them and I think they are minor.
They are not minor. C imposes a psychology that causes bugs.
I then bench-marked it in a C++ [...]
I don't think is was C++. I don't know the details but it looked like
something else -- I suggested C++/CLI elsethread -- but you probably
know and could tell us.
Visual C++ Express to generate a command line executable.
wrapper against a C sharp version
which fixed the flaws to discover that the C sharp version is about
3..5 times as slower...
You'd be better off porting the exact original to C# as well (alleged
bug and all). You could then compare algorithms. I suspect the
slowdown you see is simply C# doing its stuff. You would then have
comparable numbers to see if your "fixes" have any real cost.
[See below for why "fixes" is in "quotes".]
I've already addressed this. The C Sharp version is about 5 times as
slow. But it will work, I believe, when the bugs you found are fixed,
while the Pike code will NEVER work...for international strings.
but actually works, which the Kernighan code
doesn't owing to Kernighan's and Pike's unfortunate, and C-psychology,
mindset.
No your re-hash of the original has broken it. You have so messed up
the neat C code it is hard to find where your bugs are but they are in
there.
No, I added output to fully explain what was being done to forestall
stupid questions. The Beautiful code runs to two pages.
I'll give you two examples. Try to match "a*ab" against "aab". You
should get a match (the Pike code does) but both the C++ and the C#
versions you posted disagree. The other is more dramatic: on my C# vm
matching a pattern like "a*" gives me a helpful crash and an array
bounds violation so the error was quite easy to find. For the fix for
both, refer to Rob Pike's "ugly" code.
Yes, it appears I buttfucked the typing-in of the Pike code and made a
further error (in addition to the one I've identified) in my desire to
get a quick view of *how much* slower C sharp runs. I am traveling on
business, and I do not have the Beautiful Code book with me.
On Saturday evening, I shall review the C++ copy of the Pike code for
errors in typing and, time permitting, get a native purely C compiler
to run it for timings closer to pure C. I will also examine the
behavior of the C Sharp code.
I am also writing a regex in the style of "Build Your Own .Net
Language and Compiler", which embeds tests in the same way such that I
can run hundreds of tests before submitting it.
Thanks for your contribution. I'll discount your smart remarks.
At this time, I stand by my original claim: a factor of five doesn't
justify presenting code that doesn't work for real, international
strings, and calling a grepper a regular expression processor. The
"inefficiency" of Java and .Net is an Urban Legend.
--
Ben.
.
- Follow-Ups:
- Re: A note on computing thugs and coding bums
- From: Ben Bacarisse
- Re: A note on computing thugs and coding bums
- From: Walter Banks
- Re: A note on computing thugs and coding bums
- From: Randy Howard
- Re: A note on computing thugs and coding bums
- References:
- A note on computing thugs and coding bums
- From: spinoza1111
- Re: A note on computing thugs and coding bums
- From: Richard Heathfield
- Re: A note on computing thugs and coding bums
- From: user923005
- Re: A note on computing thugs and coding bums
- From: Rui Maciel
- Re: A note on computing thugs and coding bums
- From: spinoza1111
- Re: A note on computing thugs and coding bums
- From: Ben Bacarisse
- A note on computing thugs and coding bums
- Prev by Date: Re: A note on computing thugs and coding bums
- Next by Date: Re: A note on computing thugs and coding bums
- Previous by thread: Re: A note on computing thugs and coding bums
- Next by thread: Re: A note on computing thugs and coding bums
- Index(es):