Re: Brian Kernighan, maybe I'm not worthy, maybe I'm scum
- From: spinoza1111 <spinoza1111@xxxxxxxxx>
- Date: Thu, 27 Dec 2007 08:55:22 -0800 (PST)
Here is a summary of what I think are the flaws in the code Rob Pike
wrote for Brian Kernighan, and which Brian presents as Beautful Code
in a book of that name, published this year by O'Reilly, on page 3 (an
unindented copy of Rob's code as keyed by me, and wrapped in a C++
class, is at my developerDotStar blog). I read C but have long since
abandoned it because I don't think Beautiful Code can be written in C:
the Pike code, I think, is an example of code that only seems
Beautiful.
(1) Pike's code doesn't implement a regular expression interpreter
insofar as "regular expressions" have a formal, mathematical meaning.
For example, it makes no provision for a character which must occur at
least once.
(2) It is idiomatic C, which uses a parameter passed by value as the
changeable index into the string it points-at and it expects the user
to grok the fact that she has a useful value in that value parameter
(the start of the text conforming to the regular expression),
something unexpected outside of C. If the user passes the text as a
string literal, the useful point at which the regular expression
occurs is lost.
(3) While advertised as a string processor, it does not handle modern
Unicode or double-byte strings containing international input.
(4) Its comments are unilluminating especially as regards need-to-know
the fact that its value parameter contains a useful result (the start
position of the regular expression), and Brian's discussion in
Beautiful Code is unhelpful. Of course, as noted above, this address
is lost when a string literal is passed.
(5) The length of the substring of characters that satisfies the
regular expression is lost.
(6) Kernighan does express, in the text of the essay, the concern that
the code, which does heavy recursion proportional to the string
length, might run slowly or overrun stack limits.
But no consideration exists in the code that a semi-beautiful way
exists to avoid most recursion. If the regular expression does not
start with a string start carat or string end dollar sign, its first
character, if not followed by an asterisk, is a "handle" which which
can be scanned-for in a nonrecursive loop or by a strcspn function.
Even if the first character (that's not dollar or carat) is followed
by an asterisk, if the asterisked character occurs frequently, the
code can search for the first character, and return a match
immediately upon finding it, or, entering the recursive code upon
failure.
What would be Beautiful about this? The fact that it's not the sort of
thing an optimizing compiler would "think" of, whereas the apparent
efficiency of Pike's code is often discovered by an optimizing
compiler (although an optimizing compiler would not "see" the utility
of recursion in an iterative solution, to be sure).
It would, I think, amortize the cost of using a platform, whether Java
or C Sharp, that handles real strings through an abstraction.
Please feel free to point out where I've gone off the rails, since I
have such respect for Kernighan that I'm astonished that he feels that
this snippet is Beautiful Code.
.
- Follow-Ups:
- Re: Brian Kernighan, maybe I'm not worthy, maybe I'm scum
- From: spinoza1111
- Re: Brian Kernighan, maybe I'm not worthy, maybe I'm scum
- References:
- Brian Kernighan, maybe I'm not worthy, maybe I'm scum
- From: spinoza1111
- Brian Kernighan, maybe I'm not worthy, maybe I'm scum
- Prev by Date: Re: Brian Kernighan, maybe I'm not worthy, maybe I'm scum
- Next by Date: Re: OOP alternative
- Previous by thread: Re: Brian Kernighan, maybe I'm not worthy, maybe I'm scum
- Next by thread: Re: Brian Kernighan, maybe I'm not worthy, maybe I'm scum
- Index(es):
Relevant Pages
|