Re: Program compression



Date: Sun, 22 Jun 2008 13:13:13 +0100
Why this response is so belated:
<http://groups.google.com/group/misc.misc/msg/cea714440e591dd2>
= <news:rem-2008jun25-003@xxxxxxxxx>
From: Jon Harrop <j...@xxxxxxxxxxxxxxxxx>
Haskell, SML, OCaml, Mathematica, F# and Scala all allow real
problems to be solved much more concisely than with Lisp. Indeed, I
think it is difficult to imagine even a single example where Lisp
is competitively concise.

What does "solved much more concisely" mean??

The solution to any problem of significant complexity requires a
lot of code. Some of the code is in the core language, some is in
the libraries that ship with the core language, some is in
third-party libraries or vendor options, and some needs to be
written because that particular aspect of the problem has never
been solved before and there's always a first time for anything
that gets done. When you include *all* that code, there's no way
the totality of all that can be "concise".

Each language makes a design choice which kinds of tools are likely
to be generally useful and hence ought to be included in the core
langauge or supplied libraries, and everything else is left for
later (add-ons from vendor, third-party, and do-it-yourself).

But regardless of how much of the code is supplied by the vendor
and how much is supplied by third parties, in the end when all the
code is written, and it's time to actually run an application, the
command to run it is typically very short: You give the name of the
utility, you give a pointer to the file containing the
configuration parameters, you give a pointer to the file containing
the dataset (or that pointer is contained within the configuration
file), and you press ENTER or click the SUBMIT button to confirm
you want that task to actually execute.

So why don't you say what you really mean, that some particular
application types are better represented in published libraries in
your favorite language than in mine (Lisp)? And people who like
Lisp will respond that some *other* application types are better
represented in published libraries in Lisp language than in
*yours*? Why don't you compile a list of all the major application
types that you feel are needed by the largest number of potential
customers/users, and allow the rest of us to propose additions to
your list? For example, IMO the most commonly useful application
types are:
- Business operations (accounts receivable/payable, inventory, payroll)
- Text editing (both plain text and "word processing" formatted text)
- Web browsing/surfing/searching
- E-mail (both plain text and MIME multi-media) including protection against spam
- Conferencing and other public/group forums
- Security, including protection against computer viruses/worms/trojans/bots
There are probably five or ten more that I could add if I had the
time/energy to think of them. Anyway, after we get a complete list,
based on concensus of everyone reading comp.programming on a
regular basis and seeing this thread, *then* you need to say how
well your favorite language addresses these common application
needs.

Note that algebraic datatypes are not among the major applications I listed.

Consider the trivial example of defining a curried "quadratic"
function. In Common Lisp:
(defun quadratic (a) (lambda (b) (lambda (c) (lambda (x)
(+ (* a x x) (* b x) c)))))

I've never in my entire life found an application area where such a
tool was needed or even useful. Can you please cite what real-world
problem was aided by currying a quadradic or other function? If
it's a rare problem, then the two lines of code above are entirely
concise enough. I anybody needed to do a *lot* of that apparent
pattern of nested lambdas with one parameter each, it would be
simple to write a macro whereupon the syntax for actual usage would
reduce to:
(defcurry quadratic (a) (b c x) (+ (* a x x) (* b x) c))
Actually the expression for the quadradic is cruddy: Imagine trying
to extend that to a high-degree polynomial:
(defcurry quintic (a) (b c d e x)
(+ (* a x x x x x) (* b x x x x) (* c x x x) (* d x x) (* e x x) f))
Wouldn't it make more sense to chain the multiplications like this:
(defcurry quadratic (a) (b c x)
(+ c (* x (+ b (* x (+ a))))))
(defcurry quintic (a) (b c d e x)
(+ f (* x (+ e (* x (+ d (* x (+ c (* x (+ b (* x (+ a))))))))))))
Or for that matter, if all you're generating are standard
polynomials then you can abstract the pattern further:
(defcurrypoly quadratic (x) (a b c))
(defcurrypoly quintic (x) (a b c d e f))

In F#:
let inline quadratic a b c x = a*x*x + b*x + c

That's more verbose than my use of defcurrypoly.
Does F# provide a way to carry out the abstraction of defcurrypoly?
Also, it's absurd that the keyword for generating a curried
function is "inline" instead of "curry". I know that "curry" is
CompSci jargon, but at least it's not grossly misleading as to what
it does the way the "inline" keyword in F# is!! Anybody who doesn't
understand the jargon can look it up in Google or WikiPedia and
learn what the jargon means and not be confused. By comparison,
anybody reading code that mentions "inline" would be very unlikely
to guess that he/she should read that as what "curry" means.
(Of course if you're lying about what the "inline" keyword does in
F# then the fault lies with you, not with F#.)

By the way, sometimes the biggest gripe about mis-named functions
in Lisp is CONS CAR CDR. Like CONS is an abbreviation for
"construct", which begs the question *what* precisely is being
constructed, and CAR/CDR are totally obscure. IMO better names for
the object itself would be Standard Pair, abbreviated SP, and then
the three functions could be named MAKE-SP SP-LEFT SP-RIGHT and be
totally clear sensible to newbies. (The name for the print
expression for Standard Pair could remain "Dotted Pair", and the
connection between the internal Standard Pair and the printed
Dotted Pair would make a lot more sense to beginners.)
Note there are a lot of different kinds of (ordered) pairs available in Lisp:
- Pair of characters in a string, such as that which prints as "az"
- Pair of elements in a linked list, such as that which prints as (1 2)
- Pair of elements in a vector, such as that which prints as #(1 2)
- Key-value pair in a hash table, which has no print form but it's
the result of something like (setf (gethash :SURNAME *ht*) "Robert")
It's just that the kind of ordered pair constructed by CONS is the
*standard* kind that is essential to Lisp itself. Maybe *basic*
instead of *standard* would be a better term, but then the asswipe
who invented BASIC might sue for infringement. Maybe *common* would
be a better term? (Play on first word of "Common Lisp")

Suppose you want to hand-code a list
of data to be processed, and then map some function down that list.
In Lisp all you need to do is
(mapcar #'function '(val1 val2 val3 ... val4))
where the vals are the expressions of the data you want processed
and function is whatever function you want applied to each element
of the list. You just enter that into the REP and you're done.
Lisp:
(mapcar #'function '(val1 val2 val3 ... val4))
OCaml and F#:
map f [v1; v2; v3; .... vn]
Mathematica:
f /@ {v1, v2, v3, ..., vn}

OK, for simple lists those three languages equal Lisp in conciseness.
So what if you want to pass a more complicated structure to some function.
For example, how do you pass a nested ASSOC list to a function that
automatically traverses it matching keys against given keys to find
the correct branch at each level? Suppose that the function is
called fks which is an abbreviation for Find Key Sequence (in such
a nested assoc list implied).
(defparameter *nal1*
'(:TELEPHONECODES
(:USA "1" (:OREGON "503" "541" "971") (:NEVADA "702" "775")
(:NORTHDAKOTA "701"))
(:CANADA "1" (:MANATOBA 204))
(:BELGIUM "32" (:ANTWERP "3") (:BRUSSELS "2"))
(:CROATIA "385" (:DUBROVNIK "20") (:ZAGREV "1"))
))
(fks '(:TELEPHONECODES :CROATIA :DUBROVNIK) *nal1*) => ("385" "20")
(fks '(:FOOBAR :USA)) ;signals error, toplevel tag wrong
(fks '(:TELEPHONECODES :MEXICO :CANCUN) ;signals error, inner tag #1 missing
(fks '(:TELEPHONECODES :USA :TEXAS) ;signals error, inner tag #2 missing

Would that be equally simple to directly code in those three
languages you suggest?

... modern functional languages like Haskell, OCaml and F# ...
They [users] clearly do not have a problem with the supreme brevity
of these languages.

In cases where those particular languages are not already brief as
delivered, how easy is it to define a shorter syntax if you're
going to want a particular operation very terse because it's going
to be done a lot? I.e. how do they compare with Lisp's ability to
define macros?

... I was listing some of Lisp's most practically-important
deficiencies. Lack of static typing in the language is one.

I disagree. I pointed out how static typing doesn't really solve
the problem at all because it's incapable of dealing with
intentional data types which is where the *real* problem lies.
Static type-checking of internal datatypes doesn't hardly begin to
solve the real problem. It's just an advertising gimmick that turns
out to be more cost than value. (The exception is in tight loops
where some machine type happens to work efficiently and where that
particular loop eats up most of the CPU time, which is precisely
where Lisp allows type declarations and consequent CPU-efficient
code as an *option* where it would actually be worth the cost.)

Lack of concurrent GC in all Lisp implementations is another.

I gotta look up that to see what the term means ... first match is:
<http://xiao-feng.blogspot.com/2007/03/about-mostly-concurrent-gc.html>
That gives me some idea, but it doesn't seem like what you desire, right?

Skipping a few search results that don't look worth studying, I come to:
<http://groups.google.com/group/comp.lang.functional/browse_thread/thread/a288a989c5cd6acb/50b6ca2607173c91>
"So concurrent GC makes communication cheaper at the expense of everything else?"
That was written by none other than Jon D Harrop!
(The point was that if you have a shared mutable object, then you
need concurrent GC, but usually it's better design to have each
thread have its own mutable data and instead marshall data to pass
back and forth as needed, but then passing large objects back and
forth can cost a lot.)
There's a compromise between shared mutable data and passing large
data objects back and forth, which is to have each mutable data
object owned by just one process and everyone else must mutate or
examine it via queries in the OO style. So then only the handle on
the object, the parameters of the query (which may be handles on
other objects), and the return value (some piece of the object, or
a success/fail flag) need be passed between processes. It seems to
me that having an actual shared-mutable-object is almost as bad as
having a global variable that is mutable by multiple processes
(except for semaphores/interlocks/etc. which really *do* need to be
shared at a low level to make multi-processing work in the first place).
Later in that thread, it's pointed out that sharing memory doesn't
scale to really large applications, so you need to marshall data
(even if it's only ints) to pass between distant processors.
So in essence shared memory is a bad idea in large applications
where you need parallization in the first place, so you should
avoid shared memory, and thus there's no need for concurrent GC.
Later in that same thread, Harrop said "I'm hard pressed to find a
practically-important application that will be better off with a
concurrent GC." So if concurrent GC is essentially worthless, why
is Lisp's lack of it a reason for disliking Lisp??

<http://portal.acm.org/citation.cfm?id=1345206.1345238&coll=GUIDE&dl=GUIDE>
Hmm, this paper claims that "concurrent GC can share some of the
mechanisms required for transactional memory" thus as mechanisms
for the latter are made more efficient, same applies to former. But
I don't have time to look up yet another jargon term at bedtime to
know what the author is talking about.

So anyway, whether concurrent GC is worth the trouble or not, is debatable.

Also lack of threads,

We're in agreement that the ANSI standard didn't include threads,
which is a major deficiency if you look to fully portable software
between different implementations of CL. Allegedly at that time the
Lisp community didn't have enough experience with threads to be
sure what version of them ought to be standardized. IMO the best
strategy is to keep the code specific to thread management totally
disjoint from the business logic so that thread management can be
simply replaced when porting to a different implementation.

weak references,
[suffix "are all fundamental deficiencies of the language" for
these next several items]

It seems to me that a GC hook could be used to purge transient data
that hasn't been referenced recently. How easy is it to actually
implement this and get it working well with only ANSI CL to work
with?

finalizers,

You mean like to automatically close a system resource whenever the
Lisp hook for it is GC'd? Normally system resources like that are
lexically bound, whereby UNWIND-PROTECT is sufficient to make sure
they get closed when that block of code is exited. Is there
*really* a good reason to have such resources kept around per data
object instead of per block-entry/exit? Somehow it seems to me a
dangerous design to have system resources that get closed at
"random" times (when memory runs short) instead of at clearly
defined times. But if you can think of a really good use for random
closure, I may reconsider my opinion.

asychronous computations,

I'm not sure what you mean. Don't compilers already generate
efficient CPU data+instruction parallel-pipelining whenever they
detect that two operations are done in close proximity neither of
which depends on the result of the other?

memory overflow recovery,

That might be moot on typical modern architectures that have a
small CPU cache and a large RAM and a truly gigantic virtual
address space. Locality of reference to avoid thrashing is a
problem long before total virtual memory or disk space for swapping
is ever exceeded.

Please write up a Web page that explains:
- What you precisely mean by "concurrent GC" (or find a Web page
that somebody else wrote, such as on WikiPedia, that says
exactly the same as what *you* mean, and provide the URL plus a
brief summary or excerpt of what that other Web page says).
See Jones and Lins "Garbage Collection: algorithms for automatic dynamic
memory management" chapter 8.

I don't have access to that. Is there a summary on the Web?

In a serial GC, the program often (e.g. at every backward branch)
calls into the GC to have some collection done. Collections are
typically done in small pieces (incrementally) to facilitate
soft-real time applications but can only use a single core. For
example, OCaml has a serial GC so OCaml programs wishing to use
multiple cores fork multiple processes and communicate between them
using message passing which is two orders of magnitude slower than
necessary, largely because it incurs huge amounts of copying that
is not necessary on a shared memory machine:

On a shared-memory machine, how difficult is it to prevent
different processes from trying to read and write the same object
at the same time, whereupon the writer has only partly overwritten
the object when the reader starts to read it and sees pieces of old
and new object scrambled together?

With a concurrent GC, the garbage collector's threads run
concurrently with the program threads without globally suspending
all program threads during collection. This is scalable and can be
efficient but it is incredibly difficult to implement correctly.
The OCaml team spent a decade trying to implement a concurrent GC
and never managed to get it working, let alone efficient.

Maybe it's just too tricky (to implement correctly) to be worth
doing? Maybe the Lisp implementors were wise not to make that
gamble? Maybe it's "not ready for prime time", i.e. until somebody
finds a way to get it right, it's not worth fretting over the fact
that it hasn't been done, it's better to just live without it?

- List several kind of applications and/or API tools that are
hampered by lack of whatever that means.
Any software that requires fine-grained parallelism for
performance will be hampered by the lack of a concurrent GC.

List five such applications, and convince me that any technique
other than fine-grained parallelism is too inefficient for each
such application. Also convince me in each case that circular
structure is absolutely necessary, because after all if there are
no pointer loops then reference counting works just fine and no GC
is needed at all. Convice me that a hybrid system, using a symbol
table of tops of structures, with no loops within any of the
structures except that symbols in the symbol table may be
mentionned within structures, would not be sufficient for each such
application.

2. Even though Lisp's forte is as a language laboratory, Lisp has
so little to offer but costs so much
Um, some implementations of Common Lisp are **free** to download
and then "use to your heart's content". How does that cost too much???
Development costs are astronomical in Lisp compared to modern
alternatives like F#, largely because it lacks a static type system

As I pointed out before, static type-checking doesn't solve enough
of the "problem" to be worth the trouble it causes. It doesn't
(can't) handle intentional types. Any language that imposes static
typing is what slows down development.

but also because it lacks language features like pattern matching,

Here we go again.

decent developer tools like IDEs

I used to use Macintosh Allegro Common Lisp on my Mac Plus.
Wasn't that a IDE?

There are no decent Lisp implementations for .NET.

Why does anybody care? What, if anything, is good about .NET?

Finally, there is no way you'll ever turn a profit because the
market for commercial third-party software for Lisp is too small.

Consider leased server-side applications (CGI or PHP). How would
the customer know or care whether Lisp or some other language
happens to be used to implement it? Just about everyone on the net
uses the Google search engine. It's advertising based rather than
subscription, but ignore that detail for a moment. Do you know what
language(s) are used to implement all its internals? Do you care?

I think you should aspire to earn money directly from customers
rather than asking people to give you money to develop your
software.

I like that idea very much. Will you find me a potential customer
who wants some serverside application that doesn't already exist?
.