Brian Kernighan, maybe I'm not worthy, maybe I'm scum



This is a blog post at www.developerDotStar.com, reposted here for
useful comments. It's a critique of the "Beautiful" code authored by
Rob Pike and discussed by Brian Kernighan in a new O'Reilly book.
Because I don't use C (it's a pernicious language) I may have missed
something.

Flamers may flame and be damned. Stalkers may stalk, and be damned,
Cyberbullies may cyber-bully, and be damned.

Here we go, then.


When I had started working at Princeton University's Information
Centers in 1987, my boss at a reception for alumni student Info Center
workers said to me, Ed, this is *Brian Kernighan*.

It was all I could do to avoid making the yi-san-er kow-tow like Wayne
and Garth in Wayne's World, saying, I'm not worthy I'm not worthy I'm
scum.

I'd been very impressed with Kernighan's work, in particular his
literary and critical "take" on programming style in books he wrote in
the 1970s.

However, I was disappointed by his essay "A Regular Expression
Matcher", in Beautiful Code (O'Reilly Media, Sebastopol CA 2007) and
moreover by the book.

Kernighan's essay, and the book in general, is post-Dijkstra, post-
Algol, reflective in fact of an American centric anarcho-conservatism
in which people retreat behind the flimsy yet obdurate walls of their
favorite platforms without acknowledging that this serves corporate
needs exclusively.

Mere structs are proudly labeled objects in Web hacks. In general,
people are proud to be members of in-groups, not to evolve a global
computing language. Dijkstra isn't even in the index despite the fact
that he practically invented the very idea that code needed, as what
Dijkstra called a matter of life and death, to be elegant
syntactically and semantically over a lifetime of versions.

Turning to Kernighan's example of "beautiful" code, written in one
hour by Rob Pike in 1998, we find the code I've placed at the bottom
of this post.

[Note: Pike isn't in the index either. Oh well, he's just a
programmer, right? The "programming style" movement that Kernighan
helped start recentered the mere programmer as being a creative and
intelligent Subject, but a new conservatism relegates him along with
Dijkstra to a common grave.]

The privatized industrial *milieu* in which I fear Kernighan operates
unquestioningly, preparing his students at Princeton for a lifetime in
which there's never owing to profit pressures enough time to do it
right, but plenty of time, and job opportunities, to do it over, is
shown by the fact that to Kernighan, it's a Good Thing that Rob only
took an hour.

Paradoxically, perhaps dialectically, the programmer's time, by
becoming priced so high in an imaginary market, becomes worthless as
becomes his authorship. A constant feature of programming is the after
hours free labor of "valuable" programmers, where the worthlessness of
their free time varies inversely to the market price of their billable
time. I discovered at one firm that one of the Most Valuable Players
had become an MVP by not ever reporting the extra hours he worked,
even though he would have been paid for those hours by company policy.
Rob's product is valued because it isn't "pretentious", he didn't have
the right to work longer, only harder, producing, as we'll see, a
flawed piece of code for posterity.

The most serious flaw is that it uses the test parameter of match as
the index to the test "string" (which is only painfully a modern
string, as we'll see, whence the scare quotes).

Yes, Brian, I understand that in C the programmer can use a name as a
Von Neumann address to a byte now and forever, amen. I understand that
test is passed as are all variables in C by value, therefore on a
stack implementation (yes, the only possible implementation) the
programmer is "free", o happy day, to use the stack copy, test itself,
oh frabjous day, as the index itself.

"Freedom, hoy-day, freedom!" - Shakespeare, The Tempest

But I question the right of the C programmer to use something so
idiomatic as confusing a value parameter, something which is not
normally modifiable, as modifiable.

Brian, you cannot have your cake and eat it too, even on Christmas
Day. You want to use C as a way to speak about Beautiful Code to
Beautiful Minds.

But this requires the Beautiful Mind to call back to mind the fact
that in most, but not all, stack virtual machines, value parameters on
the stack can be slyly, in what I think to be an Ugly way, used as
"work" variables.

Unless of course the C virtual machine is a piece of hardware, on
which value parameters placed on the stack cannot be modified...or can
be only slowly, while new values (function end results) can be pushed
quickly. Such a machine, implemented in embedded hardware, is
conceivable.

In the interests then of using C as you want to use it, as an Algol
style publication language, you should have modified Rob's code to
copy test to a private workplace. The "waste" of a cycle would save
the time of the program reader.

C, as a language for talking about algorithms, is pernicious because
it requires the reader to "understand" too much, viz., that it is a
high level assembler language which obscures as much as it shows.

I understand your point that in Java and in C Sharp, the programmer
would have been compelled (boo hoo) to use a string object, and this
would be perhaps more time costly...at least on an unoptimized
platform.

But you follow in the exposition after the code is presented the rule
of the Duke of Wellington: never apologize, never explain. I looked
for an apologetic explanation to people who don't use C that test may
be used as an index in the way you use it.

You also fail to mention that the program doesn't even work for modern
strings, only for what we call in .Net, sbyte, strings o' bytes. To
call Rob's code in .Net, I have to create an Unsafe interface.

In 1998, it was painfully arguable that the world was still interested
in scanning strings o' bytes using regular expressions that are also
strings o' bytes.

Missing is Bjarne Stroustrup's Danish internationalism. In 2004 in
Shenzen I read code which applied regular expressions to Chinese text
in double-byte Unicode.

C programmers claim that their language is unicode-aware because they
are, and they can hack it. This isn't the point. The point is
information hiding, and the Java and C sharp strings can handle
international input: Rob's code cannot, and will be keyed in, as
Beautiful Code, by tyro programmers and it will fail. As in the case
of the use of the test variable, you don't discuss this issue.

As a relatively minor cavil, I know that it has been very hip, since
the days of the PDP8, to use lower case. The problem is that matchhere
is ugly because it uses double h in a way that doesn't occur in
English: it's Klingon, and why not bite the bullet and use matchHere?

In my opinion, camelCase is a thing of beauty in that it avoids the
class bias of Proper case, refusing the first letter a special honor
merely because of primogeniture, but preserving needed breaks between
words. I understand that MATCH_HERE or match_here would be Coyote Ugly
in a use of underscore common in the IBM, PL/I and Rexx tradition, but
camelCase is to me an excellent compromise.

Also, why couldn't Rob have returned an index to the place where the
regular expression returns, and, to be consistent with zero origin
indexing, -1 for no hit? It's not elegant in a real timesaving way to
do all the work you do, and return such a thin cold response to
callers whose next question will be where the heck does the string
start!

It's very confusing to have the code return 1 (affirmative) for
regular expression C* in string DD until you remember "zero, one, or
more" but there's no helping this. The reader has to understand the
basics.

I like the way you show how to use recursion simply in the real world;
too many programmers say "ooooo recursion, scarey and time wasting,
don't make me think, aaaargh".

But in general, I think your C has become a lingua franca and as such
a Frankfurt, a walled city with its own shibboleths.

This to me is contrary to the open, humanistic and critical spirit of
Kernighan's early writings on programming style, which deconstructed
bad Fortran and showed how it could be rewritten even in Fortran
itself.

I may not have realized that my (computing) generation was engaged not
in liberating computing from the shackles of privatized idiocy, but
merely in an Oedipal destruction of the 1960s generation of the IBM
fathers, creating, sad to say, its own shibboleths and orthodoxy...in
which even such a brilliant man as Kernighan stays within a discourse
boundary in which he cannot explain that "we use the value parameter
test to index, don't worry, your copy of test will not be changed
because test is called by value".

This is a computing culture dominated by American private corporations
which use computing to obscure. I have learned, for example, that the
securitized mortgages which are creating a global economic crisis
cannot be reverse engineered in such a way as to find the original
debtors, and reprice the security by determining the best risks.

This was probably because the programmers were under orders to do
things in an hour or less, as Rob Pike was praised for, and they
appeared to have discarded the idea of keeping keys in new tranches.
I'm not an expert on this, but I have enough experience to somehow
intuit that during the Big Party, yuppie scum who were getting rich
were bullying programmers to emulate Rob, and discard "unnecessary
features", whether international strings, or an audit trail.

Dijkstra is rolling in his grave. While he was alive, there was to my
knowledge absolutely no two-way communication between Dijkstra and
guys like Brian because in the American corporate context, Dijkstra's
honest criticism was a speed bump.

Brian's essay, Rob's code, and the rest of the book is a
disappointment, I'm afraid.

Here's the code wrapped in a value class for Microsoft C++ Visual
Studio with a simple main().


public value class kernighanRegexC
{
public:
/* match: search for regexp anywhere in text */
static int match(char *regexp, char *text)
{
if (regexp[0] == '^')
return matchhere(regexp+1, text);
do { /* must look even if string is empty */
if (matchhere(regexp, text))
return 1;
} while (*text++ != '\0');
return 0;
}
private:
/* matchhere: search for regexp at beginning of text */
static int matchhere(char *regexp, char *text)
{
if (regexp[0] == '\0')
return 1;
if (regexp[1] == '*')
return matchstar(regexp[0], regexp+2, text);
if (regexp[0] == '$' && regexp[1] == '\0')
return *text == '\0';
if (*text!='\0' && (regexp[0]=='.' || regexp[0]==*text))
return matchhere(regexp+1, text+1);
return 0;
}

/* matchstar: search for c*regexp at beginning of text */
static int matchstar(int c, char *regexp, char *text)
{
do { /* a * matches zero or more instance */
if (matchhere(regexp, text))
return 1;
} while (*text != '\0' && (*text++ == c || c == '.'));
return 0;
}
};

int main(array ^args)
{
Console::WriteLine((kernighanRegexC::match("C", "DDC")).ToString());
return 0;
}
.



Relevant Pages

  • Re: A note on personal corruption as a result of using C
    ... Why did you feel the need to handle NULs in strings? ... A bad programmer can recreate Fortran in any language. ... It troubled mathematicians that the square root of minus one seemed to ...
    (comp.programming)
  • Re: security enhacement to C runtime library (XXX_s)
    ... In the below link MS announces a security update to the C runtime ... Every buffer overflow error that was made before can still be ... strings in C the way they are used in every other programming ... how can we increase the programmer ...
    (comp.std.c)
  • Re: User Input issue
    ... now we know that the int data type uses 4 bytes of memory. ... You wrote a moment ago that the programmer does not ... covered strings yet (I'm a college freshman in a Programming I/Novice ...
    (comp.lang.c)
  • Re: PL/I string representations
    ... Mr Nilges is talking through his nose again. ... > machines that use ASCII. ... He liked large strings last year, ... example of a non-promotable programmer who's unfit for any role above ...
    (comp.programming)
  • Beautiful code
    ... But I am looking for examples to mimic- I'm not a bad programmer, ... beautiful functions (but feel free to nominate beautiful functions- ... this is is Usenet so don't pay me any mind). ... I think that citing the best Lisp out there might be a welcome break ...
    (comp.lang.lisp)