Re: Brian Kernighan, maybe I'm not worthy, maybe I'm scum
- From: spinoza1111 <spinoza1111@xxxxxxxxx>
- Date: Sun, 30 Dec 2007 04:09:47 -0800 (PST)
On Dec 26, 2:38 pm, spinoza1111 <spinoza1...@xxxxxxxxx> wrote:
This is a blog post atwww.developerDotStar.com, reposted here for
useful comments. It's a critique of the "Beautiful" code authored by
Rob Pike and discussed by Brian Kernighan in a new O'Reilly book.
Because I don't use C (it's a pernicious language) I may have missed
something.
Flamers may flame and be damned. Stalkers may stalk, and be damned,
Cyberbullies may cyber-bully, and be damned.
Here we go, then.
When I had started working at Princeton University's Information
Centers in 1987, my boss at a reception for alumni student Info Center
workers said to me, Ed, this is *Brian Kernighan*.
It was all I could do to avoid making the yi-san-er kow-tow like Wayne
and Garth in Wayne's World, saying, I'm not worthy I'm not worthy I'm
scum.
I'd been very impressed with Kernighan's work, in particular his
literary and critical "take" on programming style in books he wrote in
the 1970s.
However, I was disappointed by his essay "A Regular Expression
Matcher", in Beautiful Code (O'Reilly Media, Sebastopol CA 2007) and
moreover by the book.
Kernighan's essay, and the book in general, is post-Dijkstra, post-
Algol, reflective in fact of an American centric anarcho-conservatism
in which people retreat behind the flimsy yet obdurate walls of their
favorite platforms without acknowledging that this serves corporate
needs exclusively.
Mere structs are proudly labeled objects in Web hacks. In general,
people are proud to be members of in-groups, not to evolve a global
computing language. Dijkstra isn't even in the index despite the fact
that he practically invented the very idea that code needed, as what
Dijkstra called a matter of life and death, to be elegant
syntactically and semantically over a lifetime of versions.
Turning to Kernighan's example of "beautiful" code, written in one
hour by Rob Pike in 1998, we find the code I've placed at the bottom
of this post.
[Note: Pike isn't in the index either. Oh well, he's just a
programmer, right? The "programming style" movement that Kernighan
helped start recentered the mere programmer as being a creative and
intelligent Subject, but a new conservatism relegates him along with
Dijkstra to a common grave.]
The privatized industrial *milieu* in which I fear Kernighan operates
unquestioningly, preparing his students at Princeton for a lifetime in
which there's never owing to profit pressures enough time to do it
right, but plenty of time, and job opportunities, to do it over, is
shown by the fact that to Kernighan, it's a Good Thing that Rob only
took an hour.
Paradoxically, perhaps dialectically, the programmer's time, by
becoming priced so high in an imaginary market, becomes worthless as
becomes his authorship. A constant feature of programming is the after
hours free labor of "valuable" programmers, where the worthlessness of
their free time varies inversely to the market price of their billable
time. I discovered at one firm that one of the Most Valuable Players
had become an MVP by not ever reporting the extra hours he worked,
even though he would have been paid for those hours by company policy.
Rob's product is valued because it isn't "pretentious", he didn't have
the right to work longer, only harder, producing, as we'll see, a
flawed piece of code for posterity.
The most serious flaw is that it uses the test parameter of match as
the index to the test "string" (which is only painfully a modern
string, as we'll see, whence the scare quotes).
Yes, Brian, I understand that in C the programmer can use a name as a
Von Neumann address to a byte now and forever, amen. I understand that
test is passed as are all variables in C by value, therefore on a
stack implementation (yes, the only possible implementation) the
programmer is "free", o happy day, to use the stack copy, test itself,
oh frabjous day, as the index itself.
"Freedom, hoy-day, freedom!" - Shakespeare, The Tempest
But I question the right of the C programmer to use something so
idiomatic as confusing a value parameter, something which is not
normally modifiable, as modifiable.
Brian, you cannot have your cake and eat it too, even on Christmas
Day. You want to use C as a way to speak about Beautiful Code to
Beautiful Minds.
But this requires the Beautiful Mind to call back to mind the fact
that in most, but not all, stack virtual machines, value parameters on
the stack can be slyly, in what I think to be an Ugly way, used as
"work" variables.
Unless of course the C virtual machine is a piece of hardware, on
which value parameters placed on the stack cannot be modified...or can
be only slowly, while new values (function end results) can be pushed
quickly. Such a machine, implemented in embedded hardware, is
conceivable.
In the interests then of using C as you want to use it, as an Algol
style publication language, you should have modified Rob's code to
copy test to a private workplace. The "waste" of a cycle would save
the time of the program reader.
C, as a language for talking about algorithms, is pernicious because
it requires the reader to "understand" too much, viz., that it is a
high level assembler language which obscures as much as it shows.
I understand your point that in Java and in C Sharp, the programmer
would have been compelled (boo hoo) to use a string object, and this
would be perhaps more time costly...at least on an unoptimized
platform.
But you follow in the exposition after the code is presented the rule
of the Duke of Wellington: never apologize, never explain. I looked
for an apologetic explanation to people who don't use C that test may
be used as an index in the way you use it.
You also fail to mention that the program doesn't even work for modern
strings, only for what we call in .Net, sbyte, strings o' bytes. To
call Rob's code in .Net, I have to create an Unsafe interface.
In 1998, it was painfully arguable that the world was still interested
in scanning strings o' bytes using regular expressions that are also
strings o' bytes.
Missing is Bjarne Stroustrup's Danish internationalism. In 2004 in
Shenzen I read code which applied regular expressions to Chinese text
in double-byte Unicode.
C programmers claim that their language is unicode-aware because they
are, and they can hack it. This isn't the point. The point is
information hiding, and the Java and C sharp strings can handle
international input: Rob's code cannot, and will be keyed in, as
Beautiful Code, by tyro programmers and it will fail. As in the case
of the use of the test variable, you don't discuss this issue.
As a relatively minor cavil, I know that it has been very hip, since
the days of the PDP8, to use lower case. The problem is that matchhere
is ugly because it uses double h in a way that doesn't occur in
English: it's Klingon, and why not bite the bullet and use matchHere?
In my opinion, camelCase is a thing of beauty in that it avoids the
class bias of Proper case, refusing the first letter a special honor
merely because of primogeniture, but preserving needed breaks between
words. I understand that MATCH_HERE or match_here would be Coyote Ugly
in a use of underscore common in the IBM, PL/I and Rexx tradition, but
camelCase is to me an excellent compromise.
Also, why couldn't Rob have returned an index to the place where the
regular expression returns, and, to be consistent with zero origin
indexing, -1 for no hit? It's not elegant in a real timesaving way to
do all the work you do, and return such a thin cold response to
callers whose next question will be where the heck does the string
start!
It's very confusing to have the code return 1 (affirmative) for
regular expression C* in string DD until you remember "zero, one, or
more" but there's no helping this. The reader has to understand the
basics.
I like the way you show how to use recursion simply in the real world;
too many programmers say "ooooo recursion, scarey and time wasting,
don't make me think, aaaargh".
But in general, I think your C has become a lingua franca and as such
a Frankfurt, a walled city with its own shibboleths.
This to me is contrary to the open, humanistic and critical spirit of
Kernighan's early writings on programming style, which deconstructed
bad Fortran and showed how it could be rewritten even in Fortran
itself.
I may not have realized that my (computing) generation was engaged not
in liberating computing from the shackles of privatized idiocy, but
merely in an Oedipal destruction of the 1960s generation of the IBM
fathers, creating, sad to say, its own shibboleths and orthodoxy...in
which even such a brilliant man as Kernighan stays within a discourse
boundary in which he cannot explain that "we use the value parameter
test to index, don't worry, your copy of test will not be changed
because test is called by value".
This is a computing culture dominated by American private corporations
which use computing to obscure. I have learned, for example, that the
securitized mortgages which are creating a global economic crisis
cannot be reverse engineered in such a way as to find the original
debtors, and reprice the security by determining the best risks.
This was probably because the programmers were under orders to do
things in an hour or less, as Rob Pike was praised for, and they
appeared to have discarded the idea of keeping keys in new tranches.
I'm not an expert on this, but I have enough experience to somehow
intuit that during the Big Party, yuppie scum who were getting rich
were bullying programmers to emulate Rob, and discard "unnecessary
features", whether international strings, or an audit trail.
Dijkstra is rolling in his grave. While he was alive, there was to my
knowledge absolutely no two-way communication between Dijkstra and
guys like Brian because in the American corporate context, Dijkstra's
honest criticism was a speed bump.
Brian's essay, Rob's code, and the rest of the book is a
disappointment, I'm afraid.
Here's the code wrapped in a value class for Microsoft C++ Visual
Studio ...
read more »
...and here is the final command line executable version, tested but
not comprehensively (yet) for Microsoft C++ Visual studio. I've
CHANGED Pike's code to tell the caller the starting index and the
length of the string that "satisfies" the regular expression, and I've
included a somewhat extensive main() procedure to:
* Display all values in and out
* Enable quick debugging in Visual Studio, by defaulting to easily
changed compile-time values for testing when run in debug mode inside
the IDE
The main intent of returning the index and length of the satisfier
string is to allow reuse of All This Useless Beauty, but it's
incumbent on the caller to check for a zero satisfierLength when
iterating through a string to find all occurences of a regex. Also,
Pike's code follows the lazy shortest-match philosophy, which will
return zero length strings when the asterisk is used. It's
counterintuitive, but a regex of CC* and a string of CC gets you a
match length of 1 for this reason.
In fact, the leftmost shortest match philosophy in use will return a
match (correctly) of zero length and index 0 in this code for "C*" as
the regex and "dddddCCCC" as the input string. If the caller of this
code keeps iterating the call until a nonzero length match is found,
incrementing the pointer to the string, a great deal of the time saved
in recusion is wasted in iteration (and grey hair) in user land.
The shortest string philosophy is a mistake because it is
counterintuitive. It isn't Beautiful.
There's a real divergence between what unix, linux and C programmers
think about regular expressions and their mathematics. The user's
INTENT is entering a potentially zero regular expression such as C* is
two fold, to be cool if it isn't found, and consider that a match, or
to use the first nonzero string of Cs. Unfortunately, the mathematical
equivalent of C+ (at least one C in a sensible regex), CC*, will
return using the code below a length of one for CCd and not two
because zero is less than one.
Pike's code has no bugs because it creates its own reality. However,
it and its clones define the user's world and gives her no chance to
understand what's going on. I can only speculate how often this
English-centric code in use at the CIA failed to catch bad guys!
A Beautiful regular expression parser would take far more than an hour
(I adpated the code in about two hours). It would allow the user to
select the algorithm left first or right first, and longest or
shortest match. It wouldn't assume that all strings are strings of
bytes. It would support languages that run right to left, it would
support the application of its operators to parenthesised sequences,
and it WOULDN'T silently use the stack as its playtime funtime work
area.
Above all, it would EXPLAIN its algorithms and provide enough output,
as I try to do here, to the user, treating her with dignity and
respect.
Comin' right up, time permitting, in C sharp, with hooks to this code
to do speed comparisions.
// kernighanRegexC.cpp : main project file.
#include "stdafx.h"
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#define REPCOUNT_BACKUP ("1")
#define REGEX_BACKUP ("^C")
#define TESTSTRING_BACKUP ("dC")
using namespace System;
public value class kernighanRegexC
{
public:
/* match: search for regexp anywhere in text */
static int match(char *regexp, char *text, char **start, int
*length)
{
*length = 0;
*start = text;
if (regexp[0] == '^')
{
return matchhere(regexp+1, &text)
?
(*length = text - *start, 1)
:
(*start = 0, 0);
}
do { /* must look even if string is empty */
*start = text;
if (matchhere(regexp, &text))
{
*length = text - *start;
return 1;
}
} while (*text++ != '\0');
*start = 0;
return 0;
}
private:
/* matchhere: search for regexp at beginning of text */
static int matchhere(char *regexp, char **textp)
{
if (regexp[0] == '\0') return 1;
if (regexp[1] == '*')
return matchstar(regexp[0], regexp+2, textp);
if (regexp[0] == '$' && regexp[1] == '\0')
return **textp == '\0';
if (**textp!='\0' && (regexp[0]=='.' || regexp[0]==**textp))
return matchhere(regexp+1, (++(*textp), textp));
return 0;
}
/* matchstar: search for c*regexp at beginning of text */
static int matchstar(int c, char *regexp, char **textp)
{
do { /* a * matches zero or more instance */
if (matchhere(regexp, textp))
return 1;
} while (**textp != '\0' && (*(*textp)++ == c || c == '.'));
return 0;
}
};
int main(int argc, char *argv[] )
{
clock_t start;
int repeats, reps;
int result = 0;
char *regexAddr;
char *stringAddr;
char *satisfierAddr;
int satisfierLength;
if (argc != 4)
{
printf("My syntax is kernighanRegexC <repeats> <regularExpression>
<string>\n");
printf("I'll use my backup test values\n");
repeats = atoi((REPCOUNT_BACKUP));
regexAddr = (REGEX_BACKUP);
stringAddr = (TESTSTRING_BACKUP);
} else
{
repeats = atoi(argv[1]);
regexAddr = (argv[2]);
stringAddr = (argv[3]);
}
printf( "Applying the regular expression '%s' to the string '%s' %i
time(s)\n",
regexAddr, stringAddr, repeats);
start = clock();
for (reps = 0; reps < repeats; reps++)
{
result = kernighanRegexC::match(regexAddr,
stringAddr,
&satisfierAddr,
&satisfierLength);
}
double duration = (double)( clock() - start) / CLOCKS_PER_SEC;
if (reps > 0)
{
printf( "Satisfier string was %sfound\n", result==0 ? "not " : "");
if (result == 1)
{
printf("Zero-origin index of satisfier string: %i\n", satisfierAddr
- stringAddr);
printf("Length of satisfier string: %i\n", satisfierLength );
printf("Starting address of test string: %x\n", stringAddr);
printf("Starting address of satisfier string: %x\n",
satisfierAddr);
}
printf( "Time taken was about %2.1f seconds\n", duration );
}
return 0;
}
.
- Follow-Ups:
- Re: Brian Kernighan, maybe I'm not worthy, maybe I'm scum
- From: spinoza1111
- Re: Brian Kernighan, maybe I'm not worthy, maybe I'm scum
- From: spinoza1111
- Re: Brian Kernighan, maybe I'm not worthy, maybe I'm scum
- References:
- Brian Kernighan, maybe I'm not worthy, maybe I'm scum
- From: spinoza1111
- Brian Kernighan, maybe I'm not worthy, maybe I'm scum
- Prev by Date: Re: Brian Kernighan, maybe I'm not worthy, maybe I'm scum
- Next by Date: Re: Brian Kernighan, maybe I'm not worthy, maybe I'm scum
- Previous by thread: Re: Brian Kernighan, maybe I'm not worthy, maybe I'm scum
- Next by thread: Re: Brian Kernighan, maybe I'm not worthy, maybe I'm scum
- Index(es):
Relevant Pages
|
Loading