Re: Program compression



Robert Maas, http://tinyurl.com/uh3t wrote:
From: Jon Harrop <j...@xxxxxxxxxxxxxxxxx>
The solutions are shorter in modern FPLs.

OK, let's say you have a text string which contains the notation
for a nested list starting from the beginning of the string.

"The" notation? You mean Lisp's notation?

That's fine but it is of no practical interest because real data is rarely
already in Lisp syntax. For a more useful comparison you must consider
other formats.

How
many lines of code, in various modern FPLs, would it take to parse
that nested-list notation to produce an actual nested list
structure, and also report back in another value the position in
the string where the parse left off at the end of the nested-list
notation? In Common Lisp it's just one line of code:
(multiple-value-setq (ls ei) (read-from-string ts))
where ts has the given text string,
ls gets the resultant list structure,
and ei gets the resultant end index.

In OCaml with OCaml's notation:

Gram.parse expr Loc.ghost (Stream.of_string ts)

Now let's say you want to pick up where you left off, because after
that first nested list notation there's another within the same
text string you were given. How many lines of code in your favorite
FPLs? In Common Lisp, it's another one line of code:
(multiple-value-setq (ls2 ei2) (read-from-string ts :start ei))

That is just reading twice from the same stream.

My point is that different languages are based on different things
being important enough to put into the core language, but with
add-on libraries it's possible to make a lot of things be
essentially the same length in a whole bunch of languages. If you
don't mind Unix-style ultra-short names for commonly use utilities,
and you do a lot of this kind of stuff, if you really want to make
your code as short as possibly, you can use macros to abbreviate
the special operator and function used above, so the two lines of
code will easily fit into just one run-on line of code:
(mvs (ls ei) (rfs ts)) (mvs (ls2 ei2) (rfs ts :start ei))

How about something of practical relevance: custom grammars.

Consider parsing Mathematica expressions including lists, rules, sums,
products, powers and factorials with associativity and precedence. In
OCaml:

open Camlp4.PreCast;;

let mma = Gram.Entry.mk "mma";;

EXTEND Gram
mma:
[ "rule" LEFTA
[ e1 = mma; "->"; e2 = mma -> `Rule(e1, e2) ]
| "add" LEFTA
[ e1 = mma; "+"; e2 = mma -> `Add(e1, e2) ]
| "mult" LEFTA
[ e1 = mma; e2 = mma -> `Mul(e1, e2) ]
| "pow" RIGHTA
[ e1 = mma; "^"; e2 = mma -> `Pow(e1, e2) ]
| "fact" LEFTA
[ e = mma; "!" -> `Factorial e ]
| "simple" NONA
[ n = INT -> `Int(int_of_string n)
| s = UIDENT -> `Sym s
| "{"; fs = LIST0 mma SEP ","; "}" -> `List fs
| "("; e = mma; ")" -> e ] ];
END;;

For example:

# Gram.parse mma Loc.ghost (Stream.of_string "{A->B,123,C D^E+F}");;
- : _[> `Add of 'a * 'a
| `Factorial of 'a
| `Int of int
| `List of 'a list
| `Mul of 'a * 'a
| `Num of int
| `Pow of 'a * 'a
| `Rule of 'a * 'a
| `Sym of string ]
as 'a
=
`List
[`Rule (`Sym "A", `Sym "B"); `Int 123;
`Add (`Mul (`Sym "C", `Pow (`Sym "D", `Sym "E")), `Sym "F")]

So I propose when comparing languages that we allow liberal use of
macros (in languages such as Lisp which support them) to tailor the
function names to the same length in all languages, so that when we
look at the code we are comparing the size of the parse tree and
*not* the relative lengths of names of functions. And furthermore
we allow add-on libraries to compress the size of the parse tree in
the application itself down to just the part that's different from
one application to another. So the only real difference is whether
the syntax of the language (with all available help from macros and
libraries) supports the most natural expression of the application
of the tool(s).

Then you will be discounting the enormous amount of Lisp code that is
devoted to Greenspunning modern language features like pattern matching.

Macros are very rare in modern languages.

Why?

Two reasons:

.. Macros are used primarily to work around deficiencies in the language's
design, e.g. the absence of pattern matching in Lisp.

.. Using macros forks the language and makes maintenance much harder. New
programmers must effectively learn a new language. Tools no longer work
transparently with source code. Typeful programming is made harder (except
perhaps in Haskell but I am unfamiliar with its typed macro system).

Because their parse tree isn't well defined and documented, so
it would be impossible for the author of a macro (in the Lisp
sense) to know what to do?

The parse tree is no longer relevant in modern languages.

Or because the syntax of the source code
is so different from the parse tree that it would be just too
difficult to mentally keep switching back and forth between
thinking about the parse tree (when designing the function of the
macro, what the macro is supposed to do) and writing code
(including the code of the macro-expander itself)?

In all modern languages, the parse tree (the internal representation used by
the compiler and macro system) is irrelevant due to the use of quotations.
Consequently, parse trees do not need to be handled directly by OCaml
macros.

Look at the following equivalent of UNWIND-PROTECT:

# EXTEND Gram
expr: LEVEL "top"
[ [ "try"; f=sequence; "finally"; g=expr ->
<:expr<
(function
| `Val v, g -> g(); v
| `Exn e, g -> (try g() with _ -> ()); raise e)
((try `Val $f$ with e -> `Exn e), (fun () -> $g$))
>> ] ];
END;;

The input source code is represented by a sequence of tokens and the
associativity and precedence of the new grammar rule, so the programmer is
no longer forced to write without syntax as you do in Lisp.

The action is represented by a quotation <:expr< ... >> that contains
ordinary OCaml code so, again, the programmer is no longer forced to write
without syntax as you do in Lisp.

Pattern matching is also instrumental here: you can quote code in patterns
and in expressions so you can rewrite OCaml code without having to deal
with the low-level parse tree at all.

This makes term rewriting much easier in OCaml than in Lisp and is the
reason why my OCaml solution to the Minim challenge (implementing a
compiler for a Lisp-like language) was far shorter than all of the Lisp
solutions.

Or because
authors of most modern languages are unaware how valuable macros
can be in improving the "out of the box" syntax to be better fit a
problem domain or to capture the essense of common design patterns
into a single syntax?

Ironically, exactly the opposite is true. Programming language designers
have long since superceded Lisp's capabilities and it is only the few
remaining Lisp programmers who have failed to realise this. We can now
improve syntax far beyond the capabilities provided by Lisp.

You can start reinventing these modern techniques in Lisp by writing
libraries, of course, but then you are doing nothing more than
Greenspunning modern macro systems like Camlp4.

Let's think about design patterns more. The whole idea of a design
pattern is that the language itself doesn't support direct
expression of a design, so the programmer must constantly think of
the pattern and write code in that form to put the pieces of a
pattern together into a coherent hole that is self-consistent. But
with a macro, all the programmer need do is call the name of the
design pattern and the parameters to it, and the macro-expader
automatically generates the whole self-consistent design pattern.
So from the view of the programmer, it's no longer a "pattern",
it's a utility.

Here's an example of what was originally a design pattern:
(unwind-protect
(let ((chan (open "foo.txt")))
(process-file chan))
(close chan))
The idea is that if something goes wrong while the file is open,
and PROCESS-FILE throws control out of the scope of its toplevel,
the CLOSE of the channel will nevertheless be executed, so the file
**will** end up closed before control reaches the main level of
program above this code.

So with a macro, it's no longer necessary to go to that pain of
writing all the stuff in just the right relationship. Now it's just:
(with-open-file (chan "foo.txt") (process-file chan))

Now that particular macro wasn't in older versions of Lisp before
Common Lisp, but the design pattern was so commonly useful that the
macro was incorporated into Common Lisp. But back in those earlier
versions of Lisp, programmers didn't have to write the
UNWIND-PROTECT pattern over and over. Instead they just wrote a
macro once and then called it. They didn't have to wait for Common
Lisp to do it. They could realize all by themselves that the design
pattern was common, and that the macro would condense the pattern
into a foolproof utility, so they just did it without waiting for
Common Lisp to provide it built-in. That's the value of
user-definable macros.

No. That is just another example of Lispers using macros for the sake of it.
You can implement UNWIND-PROTECT using only a higher-order function. There
is no need to use a macro here.

Now there are probably a *lot* of design patterns used in Common
Lisp and other languages which are not yet covered by pre-defined
macros or other mechanisms, because at the time Common Lisp or the
other laguages were standardized those patterns weren't recognized
as common enough in use with well-established macro that it seemed
reasonable to incorporate them into the standard language. After
time passes, in some problem domains it becomes common knowledge
that certain design patterns are common enough that it *really*
would be nice if the language supported them as a compact
expression *instead* of requiring the program to repeat the design
pattern over and over and over.

I'd like to see an example of a design pattern that cannot already be
factored out in modern FPLs without the use of macros.

Here is where languages such as
Lisp which have parse-tree macros, and other languages that don't
have them, differ: In Lisp, the appliation programmer has the power
to define a macro and thereby convert a design pattern into a
simple call. In languages without macros, the programmer does *not*
have that option, and must forever write out the entire design
pattern over and over and over until and unless the people defining
the standard and implement it decide to make that simple call part
of the language.

There is overwhelming evidence to the contrary: rich syntax is ubiquitous in
all modern languages (even ones where macros play a key role, like
Mathematica) but Lispers limp on without out it precisely because Lisp is
not capable of providing this hugely beneficial functionality is a usable
way.

That is also why Lisp is so verbose in practice: the burden of parsing is
placed upon the programmer who must write out programs in what should be an
internal representation.

OCaml has a full macro system but it is rarely used.

Is it a parse-tree macro system, like Lisp has, or is it a
string-syntax macro system, like C has?

OCaml's macros act upon parse trees internally but they are not imposed upon
the programmer as they are in Lisp.

If the former, then why isn't it used?

In Lisp, macros are used primarily to Greenspun pattern matching. OCaml
already provides pattern matching so that primary need is gone and macros
see a lot less use in OCaml as a consequence.

Perhaps as I guessed above, because the parse tree
is so different from the source syntax that it's confusing to
switch back and forth between the two?

Quotations and pattern matching make that a non-issue: you do not deal with
the parse tree in modern macro systems.

If it's a string-syntax
macro system like C, then it's total crap and I can understand why
it's rarely used.

In the time you wasted writing this sprawling strawman argument you could
instead have educated yourself about Camlp4.

F# has no macro system.

Then F# is crap compared to Lisp.

That is an uninformed opinion: you are not even aware of the effects of
laziness in this context and are oblivious to typeful programming despite
its ubiquity. Moreover, you have yet to come up with a single example of a
useful macro.

Why do you think statically typed languages completely dominate
general purpose programming?

Because a lot of people don't know any better and are stuck
with installed code base (legacy code) which must be maintained.

I think a lot of people do know better.

(The exception is in tight loops
where some machine type happens to work efficiently and where that
particular loop eats up most of the CPU time, which is precisely
where Lisp allows type declarations and consequent CPU-efficient
code as an *option* where it would actually be worth the cost.)

Lisp is unable to regain the performance cost of its dynamic design:

As long ago as around the late 1970's, MacLisp achieved a
reputation that tight code loops with proper declarations ran
faster than the equivalent loop in Fortran, because the code within
each function was identical but Lisp had more efficient
function-call linkage. Do you dispute that, or do you claim that
Common Lisp has taken a step backward and no longer is as fast in
tight loops with proper declarations as it should be?

You are not seriously trying to substantiate "Lisp is not cripplingly slow"
by citing performance data that are over 30 years out of date?

Yes. Finalizers are useful for the same reason that GC itself is
good (and Ada is bad): you don't want to have to shoehorn
everything you do into scopes just to ensure that things get freed
correctly.

On a philosophical level, I totally agree with you on this point.
I'm convinced you're not a troll!
Although you frequently go overboard on exaggerations in favor of
other languages and against Lisp, you *are* bringing up good issues
and occasionally (as here) actually showing some good insight.
So I'm not sorry I engaged you in debate.

Now let's discuss the practical matter, on a case by case basis:

If you are reading a file, you probably want only one agent reading
it at a time, and you probably want to process the contents of the
file straight through (or bulk copy the whole contents of the file
to a text-string and immediately close the file and process from
the string rather than from the file itself). Accordingly lexical
scope and UNWIND-PROTECT (such as in the WITH-OPEN-FILE macro) is
totally sufficient.

If you have a connection to a remote database, you may have
multiple agents sharing a single connection. But you want to close
the connection whenever the connection goes idle too long, and
automatically re-open it whenever somebody starts to use it again,
rather than keep it open yet idle the whole time anybody has a
handle on it. Accordingly a clock-timeout mechanism makes more
sense than a finalizer tied into the GC.

I'm hard pressed to think of any case where the best mechanism is a
GC-finalizer rather than lexical scope or clock-timeout and
auto-reopen. Please enlighten me with a few examples (no details,
just the essence of the problem to be solved) where GC-finalizer is
the *best* solution to resource management, OK?

Finalizers are good when memory pressure is relevant, i.e. when you are
handling resources that consume system memory. A clock timeout is unaware
of memory pressure so it will not be as good in such situations.

For example, finalizers are used to collect weakly referenced parts of a
cache. This allows the cache to be shrunk by the GC when memory pressure is
high.

Another reason is that garbage collectors often free sooner than that.

I don't really believe that. Either the pointer to the handle to
the resource is lexically scoped, in which case GC won't reclaim it
until that pointer goes out of scope,

That is wrong. GCs can and do collect before pointers go out of scope.
Indeed, GCs are completely unaware of the notion of scope.

I discussed this on comp.lang.c++ recently with a proponent of reference
counting. He did not believe that real GCs would collect data while it was
still theoretically in scope (he was trying to claim that reference
counting always collect at least as soon as a GC but that is completely
fallacious) so I provided several complete counter examples each of which
proved that real VMs already do this.

But if you can think of a really good use for random closure, I may
reconsider my opinion.

I used finalizers in OCaml to collect unused OpenGL display lists and
textures. The results were excellent: the code was greatly
simplified at no cost.

Sorry, I don't know what those are, and they don't seem to be a
central issue that is worth my spending a half hour doing a Google
search to find a tutorial to teach me what they are. Would you
please define their important characteristics in regard to this
discussion, i.e. where they are physically located,

In system memory and automatically cached on the graphics card by the OpenGL
driver.

what reason
they have for being kept around (until all references are gone),

They represent a piece of visualization code that you will want to redraw
every frame of animation while they might be visible.

why they can't be easily re-created if idle-timeout causes their
deletion but re-activity forces their re-existance,

Creating display lists is extremely slow: they are compiled to improve
performance.

why they can't
be kept around nearly *forever* (until they have been idle a week
or two and it really would be surely a waste to keep them around
longer), etc.?

That would be a memory leak, potentially on the graphics card which has
limited memory.

On Unix/Linux there's a concept of various processes
in a tree, with each process dependent on the next parent process.
When a parent dies, all children automatically die. It would make
sense for a display list or texture to be a property of a
sub-process, which stays around so long as the main process is
around, then when the main process dies the sub-processes are
auto-killed and the display list or texture then disappears. Why
doesn't that model fit what's actually happening with display lists
and textures?

Several reasons:

.. That would impose manual memory management upon the user. Using finalizers
completely automates memory management for the user.

.. Starting and reaping processes is very slow but display lists must be very
fast.

.. OpenGL is single process only so that cannot work anyway.

The only reason I can think of is when the main
process is some persistent daemon, such as the Apache server. But I
don't think the Apache server or any of the other persistent
daemons are written in OCaml,

Lots of persistent daemons are written in OCaml, including web servers.
XenEnterprise is underpinned by several daemons that are written in OCaml
including a high performance custom database.

so there must be some other reason
why that sub-process model wouldn't solve the problem
(hypothetically if OCaml didn't have finalizers, or if you decided
to translate your application to some other language that doesn't
have finalizers). So if you can succinctly enlighten me, please do.

Creating and deleting processes is manual memory management. OCaml and its
finalizers allow you to obtain the same results with less code and lower
development costs whilst also eliminating leaks.

The vast majority of computers do not have virtual memory because
they are embedded devices.

Yeah, that's a different environment from anything I've ever
programmed for. I would suspect that in an embedded device you
wouldn't want GC in the first place, you'd do better with mostly
static allocation with perhaps some non-cyclic dynamic structures
where reference count would work just fine. But it's late and maybe
I've lost track of the context.

Embedded devices used to be PICs but they now include phones and larger
devices that can have substantial resources.

I think you should aspire to earn money directly from customers
rather than asking people to give you money to develop your
software.

I like that idea very much. Will you find me a potential customer
who wants some serverside application that doesn't already exist?

You need to write a compelling sales page and make it indexable
by search engines and make it as easy as possible for users to give
you their money before plastering all of the news websites that you
can find with descriptions of your new service and links to it.

That's a strawman comparison.

No, it is my personal advice based upon years of succeeding in this
industry: growing a company from seed with no investment and increasing
profits exponentially year upon year when almost all others go bankrupt.

My reason for the above advice is that the "pre-customers" you are alluding
to are either unidentifiable or worthless. The vast majority of people who
offer you advice on the basis that they will buy your product end up not
buying anything and the people who do buy your product will be people whom
you have been completely unaware of and who have completely different
requirements and values.

Let me give you some examples:

My book OCaml for Scientists was written by me as a physical scientist and
aimed at other physical scientists. I made the same mistake that you are of
seeking out pre-customers and taking their advice. Atmospheric scientists
told me to strip out the maths and discuss interoperability with obscure
tools but virtually none of them actually paid for the book.
Bioinformaticians are probably our largest market of real paying customers.
I never got any advice from bioinformaticians whilst writing the book and
the fact that much of the content is well-suited to their work is a
complete fluke.

I was less fortunate with our technical presentation software Presenta where
I again made the mistake of listening to "pre-customers" who told me that
compiled OCaml binaries using OpenGL on Linux would work and that Mac OS X
was an important platform. I followed their advice and discovered that such
binaries are unusably unreliable on Linux (whereas the equivalent works
flawlessly on Windows because it has modern hardware-accelerated GUIs that
require this technology to work or Windows itself will not even work) and
Mac OS X has virtually no technical users with money and pitiful
development tools.

More recently, I have been completely ignoring advice from "pre-customers"
and have focussed on getting products to market faster based upon my
intuition of what people need. This is vastly more lucrative.

In fact before doing *either*, I need
to find some customers who desire some new server-side application
that is within my capability, and who are willing to work with me
to define use cases that I might offer to implement, and then once
that's all done *then* we start talking about a fixed-price
contract to implement those use cases, and alpha testing, etc. Then
after I have one set of satisied customers, only then would it be
reasonble to advertise to see if there might be lots of other
customers for an existing customer-pleasing application.

That is the second highest risk approach to building a company in this
industry, the highest risk being operating at a loss from VC funding as a
gamble that your entire company will be bought out soon enough that your
shares haven't been diluted to the point that they are worthless. Unless
you have a lot of capital sloshing around or a steady income, you will not
survive the inevitable contract droughts and null payments and will go
bankrupt very quickly.

I found that selling commodities like OCaml for Scientists to lots of
ordinary programmers is not only more stable than gambling but can also be
just as lucrative. For example, we recently lost a £25k contract with
Microsoft because they went into duck and cover mode for the US recession
but it doesn't matter because we will make more money in the long term by
spending that time diversifying to other platforms and developing more
commodity products that we will sell to hundreds of individuals over the
next three years.

So can you find me a first brainstorm-use-cases customer to get
this whole process started? I've mentionned lots of new software
technology I've implemented to the pointer where I feel they might
possibly be useful for a variety of applications that might be of
value to others. But until I find a potential customer who is
actually interested in what I have done and how I can work it into
a salable product, I don't know if any of the wondrous things I've
programmed really could lead to a salable product.

The only way to find out is to put it up for sale.

By the way, Tuesday of last week I gave a partial demo of some of
my computer-assisted instruction software, and the person seeing
the demo seemed to think it would be of interest to two of his
technical acquaintances, one very far away (about 1000 miles) and
one closer (about 40 miles), so he'll be contacting them to try to
get them interested.

Excellent. Best of luck.

--
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com/products/?u
.



Relevant Pages

  • Re: Fundamentals of Lisp efficiency?
    ... your lisp will likely drop dead. ... If your code has a `main loop' from which all subroutines ... Especially for lists. ... Naive floating point in Common Lisp is ...
    (comp.lang.lisp)
  • Re: Alternatives 4
    ... This does show something interesting about lisp, ... And the OCaml code demonstrates how other languages can provide both ... performance and brevity at the same time, which is the power of OCaml. ... Emacs is the best there is as far as an IDE for OCaml goes ...
    (comp.lang.lisp)
  • Re: reading file to list
    ... Common Lisp is the other way around. ... Your claim that this shows something wrong with lists is completely ... suppose we wanted to do it the way the Ruby code does it. ...
    (comp.lang.lisp)
  • Re: Alternatives 4
    ... performance and brevity at the same time, which is the power of OCaml. ... And the optimized lisp benchmarks twice as fast... ... I do not believe longer identifiers improve clarity at all in such ...
    (comp.lang.lisp)
  • Re: Program compression
    ... Haskell, SML, OCaml, Mathematica, F# and Scala all allow real ... problems to be solved much more concisely than with Lisp. ... for simple lists those three languages equal Lisp in conciseness. ... and thus there's no need for concurrent GC. ...
    (comp.programming)

Quantcast