Re: Cons cell archaic!?



On Oct 9, 4:44 pm, Anticomuna wrote:

How? Is there anything wrong with the concept? I thought it was so
easy to parse the S-expressions because of it. Is this really a
problem for Lisp or this is more of a non-Lisper automatic reaction to
something he is not used to?

How would the implementation of a Lisp without a using cons look like?
Would it have any advantage over the "archaic" one?

Not sure what you mean by “archaic” ones, but yes, lisp's cons and the
irregularities of its nested paren syntax are 2 fundamental problems
of lisp.

See:
• Fundamental Problems of Lisp
http://xahlee.org/UnixResource_dir/writ/lisp_problems.html

Plain text version follows.

---------------------------------
Fundamental Problems of Lisp
Xah Lee, 2008-07

In this article, we discuss 2 problems that are rooted in lisp. One is
the irregularity in its often cited regular syntax. The other is the
language's use of “cons” for list.
Syntax Irregularities

Lisp family of languages, in particular, Common Lisp, Scheme Lisp,
Emacs Lisp, are well know for its syntax's regularity, namely,
“everything” is of the form “(f x1 x2 ...)”.. However, it is little
talked about that there are several irregularities in its syntax. Here
are some examples of the syntax irregularity.

* The comment syntax of semicolon to end of line “;”.
* The dotted notation for cons cell “(1 . 2)”.
* The single quote syntax used to hold evaluation, e.g. “'(1 2
3)”.
* The backquote and comma syntax used to hold but evaluate parts
of expression, e.g. “(setq x 1) (setq myVariableAndValuePair
`(x ,x))”.
* The “,@” for inserting a list as elements into another list.
e.g. “(setq myListX (list 1 2)) (setq myListY (list 3 4)) (setq
myListXY `(,@ myListX ,@ myListY))”
* There are various others in Common Lisp or Scheme Lisp. For
example, the char “#” and “#|”. In Scheme's R6RS, it has introduced a
few new ones.

In the following, we detail how these irregularities hamper the power
of regular syntax, and some powerful features and language
developments that lisp have missed that may be due to it.
Confusing

Lisp's irregular syntax are practically confusing. For example, the
difference between “(list 1 2 3)”, “'(1 2 3)”, “(quote (1 2 3))” is a
frequently asked question. The use of “` , ,@” etc are esoteric. If
all these semantics use the regular syntactical form “(f args)”, then
much confusion will be reduced and people will understand and use
these features better. For example, The “'(1 2 3)” might be changed to
“(' 1 2 3)”, and

(setq myListXY `(,@ myListX ,@ myListY))

could have been:

(setq myListXY (eval-parts (splice myListX) (splice myListY)))

or with sugar syntax for typing convenience:

(setq myListXY (` (,@ myListX) (,@ myListY)))”

Syntax-Semantics Correspondence

A regular nested syntax has a one-to-one correspondence to the
language's abstract syntax tree, and to a large extent the syntax has
some correspondence to the language's semantics. The irregularities in
syntax break this correspondence.

For example, programers can pretty much tell what piece of source code
“(f x1 x2 x3 ...)” do by just reading the name that appears as first
element in the paren. As a contrast, in syntax soup languages such as
Java, Perl, the programmer must be familiar with each of its tens of
syntactical forms. (e.g. “if (...) {...}”, “for (....; ...; ...)
{...}”, “(some? this: that)”, “x++”, “myList = [1, 2, 3]” etc.) As a
example, if lisp's “'(1 2 3)” is actually “(quote 1 2 3)” or shortcut
form “(' 1 2 3)”, then it is much easier to understand.
Source Code Transformation

Lisp relies on a regular nested syntax. Because of such regularity of
the syntax, it allows transformation of the source code by a simple
lexical scan. This has powerful ramification. (lisp's macros is one
example) For example, since the syntax is regular, one could easily
have alternative, easier to read syntaxes as a layer. (the concept is
somewhat known in early lisp as M-expression) Mathematica took this
advantage (probably independent of lisp's influence), so that you
really have easy to read syntax, yet fully retain the regular form
advantages. In lisp history, such layer been done and tried here and
there in various forms or langs ( CGOL↗, Dylan↗), but never caught on
due to largely social happenings. Part of these reasons are political
and lisper's sensitivity to criticism of its nested parens.

In lisp communities, it is widely recognized that lisp's regular
syntax has the property that “code is data; data is code”. However,
what does that mean exactly is usually not clearly understood in the
lisp community. Here is its defining characteristics:

A regular nested syntax, makes it possible to do source code
transformations trivially with a lexical scan.

The benefits of a regular syntax has become widely recognized since
mid 2000s, by the XML language. The XML language, due to its strict
syntactical regularity, has developed into many transformation
technologies such XSLT, XQuery, STX etc. See XML transformation
languages↗.
Automatic, Uniform, Universal, Source Code Display

One of the advantage of pure fully functional syntax is that a
programer should never need to format his source code (i.e. pressing
tabs, returns) in coding, and save the hundreds hours of labor,
guides, tutorials, advices, publications, editor tools, on what's
known as “coding style convention”, because the editor can reformat
the source code on the fly based on a simple lexical scan.

Because lisp's syntax has lots of nested parenthesis, the source code
formatting is much more labor intensive than syntax soup languages
such as Perl, even when using a dedicated lisp editor such as emacs
that contain large number editing commands on nested syntax.

The lisp community, established a particular way of formatting lisp
code as exhibited in emacs's lisp modes and written guides of
conventions. The recognition of such convention further erode any
possibility and awareness of automatic, uniform, universal,
formatting. (e.g. the uniform and universal part of advantage is
exhibited by Python)

As a example, the Mathematica language features a pure nested syntax
similar to lisp but without irregularities. So, in that language,
since version 3 released in 1996, the source code in its editor are
automatically formatted on the fly as programer types, much in the
same way paragraphs are automatically wrapped in a word processor
since early 1990s

Note the phrase “automatic, uniform, universal, source code display”.
By “automatic”, it means that any text editor can format your code on
the fly or by request, and this feature can be trivially implemented.
By “uniform”, it means there is one simple and mechanical heuristic,
to determine a canonical way to format any lisp code for human-
readable display. By “universal” is meant that all programers, will
recognize and habituated with this one canonical way, as a standard.
(they can of course set their editor to display it in other ways)

The “uniform” and “universal” aspect is a well-known property of
Python lang's source code. The reason Python's source code has such
uniform and universal display formatting is because it is worked into
the language's semantics. i.e. the semantics of the code depends on
the formatting (i.e. where you press tabs and returns). But also note,
Python's source code is not and cannot be automatically formatted,
precisely because the semantics and formatting is tied together. A
strictly regular nested syntax, such as Mathematica's, can, and is
done, since 1996. Lisp, despite its syntax irregularities, i think it
still can have a automatic formatting at least to a large, practical,
extent. Once lisp has automatic on-the-fly formatting (think of
emacs's fill-sexp, auto-fill-sexp), then lisp code will achieve
uniform and universal source code formatting display.

The advantage of having a automatic, uniform, universal, source code
display for a language is that it gets rids of the hundreds of hours
on the labor, tools, guides, arguments, about how one should format
his code. (this is partly the situation of Python already) But more
importantly, by having such properties, it will actually have a impact
on how programer codes in the language. i.e. what kind of idioms they
choose to use, what type of comments they put in code, and where.
This, further influences the evolution of the language, i.e. what kind
of functions or features are added to the lang. For some detail on
this aspect, see: The Harm of Manual Code Formating
Syntax As Markup Language

One of the power of such pure nested syntax is that you could build up
layers on top of it, so that the source code can function as markup of
conventional mathematical notations (i.e. MathML) and or as a word-
processing-like file that can contain structures, images (e.g.
Microsoft Office Open XML↗), yet lose practical nothing.

This is done in Mathematica in 1996 with release of Mathematica
version 3. (e.g. think of XML, its uniform nested syntax, its diverse
use as a markup lang, then, some people are adding computational
semantics to it now (i.e. a computer language with syntax of xml. e.g.
O:XML↗). You can think of Mathematica going the other way, by starting
with a computer lang with a regular nested syntax, then add new but
inert keywords to it with markup semantics. The compiler will just
treat these inert keywords like comment syntax when doing computation.
When the source code is read by a editor, the editor takes the markup
keywords for structural or stylistic representation, with title,
chapter heading, tables, images, animations, hyperlinks, typeset math
expression (e.g. think of MathML↗) etc. The non-marked-up keywords are
shown as one-dimensional textual source code just like source code is
normally shown is most languages.)
Frequently Asked Questions

You say that lisp syntax irregularities “reduce such syntax's power”.
What you mean by “syntax's power”?

Here are some concrete examples of what i mean by power of syntax.

In lisp, the comment is done by the char “;” running to end of line.
Note that this does not allow nested comment. So for example, if you
have multi-line code, and you want to comment out them all, you have
to prepend each line by a semicolon. However, if you have nested
comment syntax, one could just bracket the block of code to comment it
out. This, is a simple, perhaps trivial, example of “power of a
syntax”.

In Python, the formatting is part of the lang's syntax. Some
programers may not like it, but it is well accepted that due to
Python's syntax, python code is very easy to read, and it much done
away about programer preferences and argument about code formatting.
This is example of power of a syntax.

Let me give another, different example. You know that perl's syntax,
often the function's arguments do not necessarily need to have a paren
around it. For example, “print (3);” and “print 3;” are the same
thing. This is a example of power of syntax, when considered as a
flexibility or save of typing, for good or bad. Similarly, in
javascript for example, ending semicolon is optional when it is at end
of a line. (for sample perl and python code, see Xah's Perl and Python
Tutorial.)

In Mathematica, the language has a syntax system such that you can use
fully regular nested notation (e.g. “f[g[x]]”), postfix notation (e.g.
“x//g//f”), prefix notation (e.g. “f@g@x”), infix notation (e.g.
“1~Plus~2”), for ANY function in the language, and you can mix all of
the above. (prefix and postfix by nature are limited to functions with
just 1 arg, and infix notation by nature are limited to functions with
2 args) This is a example of power of syntax.

(for detailed explanation of Mathematica syntax and comparison to
lisp's, see: The Concepts and Confusions of Prefix, Infix, Postfix and
Fully Functional Notations )

In general, a computer lang has a syntax. The syntax, as text written
from left to right, has various properties and characteristics. Ease
of input (think of APL as counter example), succinctness (e.g. Perl,
APL), variability (Perl, Mathematica), readability (Python),
familiarity (C, Java, Javascript, ...), 2-dimensional notation (e.g.
traditional math notation) (Mathematica), ease of parsing (lisp),
regularity (APL, Mathematica, Lisp, XML), flexibility (Mathematica)...
etc. Basically, you can look at syntax, and programer's need to type
them, and how the textual structure can be mapped to the semantic
space, from many perspectives. The good qualities, such as ease of
input, ease of reading, ease of parsing, ease of transformation,
simplicity, flexibility, etc, can be considered as elements of the
syntax's power.

As a example of syntax of little power, think of a lang using just the
symbol “0” and “1” as its sole char set.

Many of lisp's sugar syntax are designed to reduce nested paren. Why
using a more consistent, but more annoying sugar syntax?

Ultimately, you have to ask why lisper advocate nested syntax in the
first place.

If lispers love the nested syntax, then, the argument that there
should not be irregularities, has merit. If lispers think occasional
irregularities of non parenthesized syntax is good, then there's the
question of how many, or what form. You might as well introduce “++i”
for “(setq i (1+ i))”.

Further readings:

* The Concepts and Confusions of Prefix, Infix, Postfix and Fully
Functional Notations
* The Harm of Hard-wrapping Lines
* A Text Editor Feature: Syntax Tree Walk
* A Simple Lisp Code Formatter

The Cons Business

The above are some of the damages of irregularity in lisp's syntax.
The other fundamental problem in the language is its cons cells as its
list construction primitive.

Lisp at core is based on functional programing on lists. This is
comparatively a powerful paradigm. However, for historical reasons,
lisp's list is based on the hardware concept of “cons” cell.. From a
mathematical point of view, what this means is that lisp's lists is
limited to a max of 2 elements. If you want a longer list, you must
nest it and interpret it in a special way. (i.e. effectively creating
a mini-protocol of nesting lists, known as proper lists.) The cons
fundamentally crippled the development of list processing.

Lisp being historically based the cons for like 2 or 3 decades. The
cons (and cdr, car, caadar etc) are fundamentally rooted in the lisp
langs, is thus not something that can be easily mended. This is
unfortunate. However, this situation could be improved. (by, for
example, exposing only higher-level list manipulation functions in all
newer literature, or even mark cons related functions as obsolete)

One of the myth that is quickly injected into budding lispers, is that
cons are powerful. It is powerful in the sense any assembly lang is
powerful. Lisp's cons is perhaps the greatest *** up in the history
of computer languages.

The cons issue bugged me for 10 years, since i first tried to learn
scheme in ~1999. I've never worked with lisp (other than academic
reading) until in recent years with emacs lisp since 2005. Several
times i tried to explain to lispers this problem, but usually with
paragraphs and paragraphs of descriptions, analogies, examples,
frustrated by how hard it is to convey such a simple flaw (aided by
their blind, deep-seated, lisp fanaticism). Yesterday it hit on me.
The above mathematical aspect of lisp's cons is the first time i
formulated the cons problem concisely. (my previous verbose
explanation here: Lisp's List Problem )
Frequently Asked Questions

If you don't like cons, lisp has arrays and hashmaps, too.

Suppose there's a lang called gisp. In gisp, there's cons but also
fons. Fons are just like cons except it has 3 cells with car, cbr,
cdr. Now, gisp is a old lang, the fons are deeply rooted in the lang.
Every some 100 lines of code you'll see a use of fons and car, cbr,
cdr, or any one of the caar, cdar, cbbar, cdbbar, etc. You got annoyed
by this. You as a critic, complains that fons is bad. But then some
gisp fanatics retort by saying: “If you don't like fons, gisp has
cons, too.”.

You see, by “having something too”, does not solve the problem of
polution. Sure, you can use just cons in gisp, but every lib or
other's code you encounter, there's a invasion of fons with its cbbar,
cdbbar, cbbbr. The problem created by fons cannot be solved by “having
cons too”.

I like the cons concept. Even in functional languages like Haskell it
is popular, e.g. when matching in the form of (x:xs), which is the
same like car/cdr in Lisp.

Languages that has a list data type and First, Rest functions do not
mean it has lisp's cons problem.

One part of the cons problem in lisp is that it forces programer to
think of list in a low-level nested of 2-item construction, with
explicit functions like “cons”, “car”, “cdr”, “caar”, “cadr”, “cdar”,
“cddr”, “caaar”, “caadr” etc.

In other langs, the programer is not forced to think of nested 2-
items.

The other problem with lisp's cons, is that it hinders any development
of tree data structure. For example, one might write a function that
extracts the leafs of a tree. But due to lisp's list made of cons,
there is a different interpretations of what's considered a leaf.
Similarly, binary tree in lisp can be implemented either using cons
natively, or use so-called “proper list” that is implemented on top of
cons. Worse, any proper list can be mixed with improper list. So, you
can have a list of cons, or cons of lists, cons of cons, list of
lists, or any mix. The overall effect of the cons is that it prevents
lisp to have a uniform view of tree structure, with the result that
development of functions that work on tree are inconsistent, few, or
otherwise hampered.

Further readings:

* Lisp's List Problem
* My First Encounter And Impression Of Lisp
* The Jargon “Lisp1” vs “Lisp2”

Why Lisp's Cons Problem Never Got Fixed?

Now, a little speculation.

We might wonder, why lisp has the cons problem and was never
addressed?

I guess at the time, 1960s and 1970s, the very fact that you could
have a concept like a list in a lang and manipulate it as a unit, is
extremely advanced at the time. The list, being built up by hardware
cons, is just something that has to be done.

Having data as a single manipulatable object (list) didn't really
become popular until the 1990s. (notably by Perl) And today, it is THE
most important feature of high-level languages (perl, python, php,
javascript... of the langs i'm expert of).

The lisp's cons, as a underlying primitive that builds lists, even
though a bit cumbersome, but works just fine when list structure are
simple. Even today, with all the perl, python, php, javascript etc
langs that deal with lists, vast majority of list usage is just simple
flat list, sometimes 2 level of nesting (list of list, list of hash,
hash of list). 3 levels of nesting is seldom used unless its 3d
matrices used mostly in computer graphics or linear algebra
applications. Greater than 3 level is almost never seen. Systematic
manipulation and exploitation of nested list, such as mapping to
leafs, to particular level, transposition by permutation on level, or
list structure pattern matching in today's functional langs, etc is
hardly ever to be seen. (except in Mathematica, APL)

So, in general, when you just deal with simple lists, the
cumbersomeness in using cons, car, cdr, caardr etc for list doesn't
really surface. Further, the cons is fundamentally rooted in the
language. It's not something that can be easily changed except
creating a new language. When there is a specific need in a
application, there is a haphazard collection of functions that deal
with lists at a higher level.

Today, with list manipulation being mainstream, especially with the
uprising of many functional langs, the lisp's cons historical baggage
becomes more apparent and painful.
Will Lisp ever be Popular?

Mathematica today sells for over 2 thousands dollars. Its sales
record, throughout its history, is probably more than ALL commercial
lisps combined. Such a illuminating social fact, in not known in lisp
communities. These fuckheads thinks that lisp is more popular.

10 years ago, in the dot com days (~1998), where Java, Javascript,
Perl are screaming the rounds. It was my opinion, that lisp will
inevitably become popular in the future, simply due to its inherent
superior design, simplicity, flexibility, power, whatever its existing
problems may be. Now i don't think that'll ever happen as is. Because,
due to the tremendous technological advances, in particular in
communication (i.e. the internet and its consequences, e.g. Wikipedia,
youtube, youporn, social networks sites, blogs, Instant chat, etc)
computer languages are proliferating like never before. (e.g. Erlang,
OCaml, Haskell, PHP, Ruby, C#, F#, Perl6, Arc, NewLisp, Scala, Groovy,
Lua, Q, Qz, Mercury, Scratch, Flash, Processing, (see Proliferation of
Computing Languages) ..., helped by the abundance of tools, libraries,
parsers, existence of infrastructures) New langs, will either have
most advantages of lisps, and or with modern libraries and idioms that
better fits today's need. I see that, perhaps in the next decade, as
communication technologies further hurl us forward, the proliferation
of langs will reduce to a trend of consolidation (e.g. fueled by
virtual machines such as Microsoft's “.NET”. (and, btw, the breaking
of programer's social taboo of cross communication of computing
languages, led by Xah Lee)).

There is one slight hope for Lisp the lang as we understood it, and
that is Emacs Lisp. Due to, it being deeply rooted in the industry as
a powerful text editor, and the embedded lisp have major practical
impact in getting people to know lisp and also actually use it for
practical need. The installed base of emacs lisp system is perhaps 100
or 1000 times more than the number of all Common Lisp and Scheme
Lisp's installed base. (this satisfying prospect is however mar'd by
the tech geekers. e.g. the Common Lispers, and Scheme Lispers,
perennially inject snide and sneer at emacs lisp that harms its
progress, while the fucking emacs priests, want to coffin emacs in its
1980s UI and terminologies. (see The Modernization of Emacs, Text
Editors Popularity.))

--------------------------

Xah
http://xahlee.org/


.