Re: Lisp/Unix impedance [a programming challenge]



On Fri, 29 Apr 2005 16:31:42 +0200, Pascal Bourguignon wrote:
>> I don't expect you to implement it. I'm trying to get people to think
>> about the mismatch between Common Lisp and Unix and why there may be valid
>> technical reasons for avoiding Common Lisp as a glue language upon Unix.
>
> The mismatch is in your imagination. Most Common Lisp implementation
> on unix use by default an 8-bit 1-1 encoding that allow to process
> binary data as a stream of character like any other unix.

This used to be historically true. Common Lisp used to be a better match
with Unix when Common Lisp characters mapped to octets and vice versa.

I note that your code below:

(a) Switches back to using characters instead of octets [no more opening
of /dev/stdin as (unsigned-byte 8)]

(b) Changes the default encoding which tends to belie your claim that
"Most Common Lisp implementation on unix use by default an 8-bit 1-1
encoding that allow to process binary data as a stream of character like
any other unix."

If I use the default encoding this happens:
*** - invalid byte #xFF in CHARSET:UTF-8 conversion, not a Unicode-16

[Note that -e in echo is necessary to enable interpretation of
backslash-escaped characters]

Note also that any modern Unix should default to a UTF-8 locale (and if it
doesn't it will almost certainly do so in the future). For a very long
time I have never even built the ISO-8859-1 locale on many of my systems,
including the one I'm writing this reply to you on:
$ locale -a
C
en_NZ.utf8
POSIX

<http://www.cl.cam.ac.uk/~mgk25/unicode.html>

Red Hat Linux 8.0 (September 2002) was the first distribution to take
the leap of switching to UTF-8 as the default encoding for most
locales. The only exceptions were Chinese/Japanese/Korean locales, for
which there were at the time still too many specialized tools available
that did not yet support UTF-8. This first mass deployment of UTF-8
under Linux caused most remaining issues to be ironed out rather
quickly during 2003. SuSE Linux then switched its default locales to
UTF-8 as well as of version 9.1 (May 2004). Most other distributions
can be expected to follow soon.

(c) Your approach is broken anyway. You have not heeded the CLISP
implementation notes.

> [pjb@thalassa tmp]$ echo "\777dog eats cat" | ./filter
> \777dog eats dog
> [pjb@thalassa tmp]$ thru < filter
> #!/usr/local/bin/clisp -q -ansi -norc -E iso-8859-1
> (loop for line = (read-line *standard-input* nil nil)
> while line
> do (when (search "cat" line)
> (replace line "dog" :start1 (search "cat" line)))
> (princ line *standard-output*)
> (terpri))

This will demonstrate what's wrong with the character approach
(filter2.sh):

#!/usr/bin/clisp -q -ansi -norc -E iso-8859-1
(loop for char = (read-char *standard-input* nil nil)
while char
do (print char))

Compare this [carriage returns and line feeds (newlines) pass through
unmolested]:
$ echo -e -n "\r\n\r\n" | od -a -h
0000000 cr nl cr nl
0a0d 0a0d
0000004

.... with this:
$ echo -e -n "\r\n\r\n" | ./filter2.sh

#\Newline
#\Newline

.... this:
$ echo -e -n "\n\n\n\n" | ./filter2.sh

#\Newline
#\Newline
#\Newline
#\Newline

.... and this:
$ echo -e -n "\r\r\r\r" | ./filter2.sh

#\Newline
#\Newline
#\Newline
#\Newline

>From the CLISP implementation notes:
<http://clisp.cons.org/impnotes/clhs-newline.html>

When reading from a file, CR/LF is converted to #\Newline (the usual
convention on DOS), and CR not followed by LF is converted to #\Newline
as well (the usual conversion on MacOS, also used by some programs on
Win32). If you do not want this, i.e., if you really want to
distinguish LF, CR and CR/LF, you have to resort to binary input
(function READ-BYTE).

Also note the rationale: "In CLISP, #\Newline is identical to #\Linefeed
(which is specifically permitted by [ANSI CL standard] in section
Character Names)."

"The mismatch is in your imagination."

Regards,
Adam
.



Relevant Pages

  • Re: Lisp/Unix impedance [a programming challenge]
    ... > portable, Unix or no. ... octets as if they were text and vice versa. ... Lisp character stream and Common Lisp characters to an octet stream is ...
    (comp.lang.lisp)
  • Re: Lisp/Unix impedance [a programming challenge]
    ... >> the text is just a fragment of a larger Unix stream. ... > soon as you show us a portable shell script that is able to use them to ... I'd propose even to implement this challenge in Common Lisp ... technical reasons for avoiding Common Lisp as a glue language upon Unix. ...
    (comp.lang.lisp)
  • Re: Lisp/Unix impedance [a programming challenge]
    ... > about the mismatch between Common Lisp and Unix and why there may be ... > technical reasons for avoiding Common Lisp as a glue language upon ... Having only recently discovered scsh and then immediately wishing for a ...
    (comp.lang.lisp)
  • Re: Why Lisp is too hard for me to use
    ... > just because we make things a bit easier on Unix. ... janson% ./clisp ... Copyright Bruno Haible, Pierpaolo Bernardi, Sam Steingold 1998 ... Welcome to Macintosh Common Lisp Version 4.3.1! ...
    (comp.lang.lisp)
  • Re: What is the maximal length of usernames on Solaris?
    ... >>standard on the job humor and is what helps keep us sane. ... >>As far as I am aware the general consensus amongst unix ... then tell him the choice comes down to breaking apps every version ... about the character set limit; ...
    (comp.sys.sun.admin)