Re: Why is lisp so weird?

RobertMaas_at_YahooGroups.Com
Date: 05/04/04


Date: Mon, 03 May 2004 17:04:16 -0800


> From: Marcin 'Qrczak' Kowalczyk <qrczak@knm.org.pl>
> Erann Gat wrote:
> > Furthermore, there is no way to leverage the C++ parser at run time, so
> > even people who don't write compilers are forever writing parsers for
> > little languages for input data.
> You can use "read" only if you are free to choose the input format.

Correct. But whenever the original data is not in machine readable
format, you might as well enter it in s-expression format rather than
any other format, at which point LISP can read it naturally. Or if data
is collected by a form, where each field of data is within a single
field of the form, it's trivial to build an s-expression from the
fields, and so you might as well do that instead of build some
non-s-expression format and hassle with somehow parsing it. Even if the
data is in some machine-readable but non-s-expression format
originally, often you can manually convert it to s-expressions in a
text editor, or use editor macros to convert it semi-automatically. In
all those cases, passing it to READ can test whether you've done that
data-entry or conversion task correctly, that is whether you get an
error when trying to READ it.

In the worst case, you have to write an actual parser to convert it to
internal form, whereupon you can print it out to generate external
s-expression format.

In any of those cases, you can prettyprint it and just look at it to
see whether it's structured correctly, hence whether your editor macros
or parser did *really* the right thing, not just s-expressions
externally or pointy structures internally, but hierarchial levels of
the data structure all correct.

> The program I had to write last week needed to parse an already existing
> file with lines looking like this:

<f>pobudzaj1c1<l>pobudzaf<t>pact:sg:acc:f:imperf:aff<t><N>pact:sg:inst:f:imperf:aff

The important thing is how that is supposed to represent hierarchial
data. What are the "words" and what are the operators that connect
those words together and what is operator precedence needed to say
which two words are combined into a sub-structure before that whole
sub-structure is then combined with another word or another
sub-structure. Once you know the answer to that question, you can write
a parser for that kind of data. Then as I said above, prettyprint the
result and just look at it to see if you got the hierarchial levels
correct.

> The parser took 25 lines of Perl.

And how do you know you got it correct without a default prettyprinter
to show you the levels of hierarchy? Did you have to write your own
prettyprinter from scratch just to see the data in structured form?
So when the parse of that one line was prettyprinted, what did it look
like?

> I advocate using safe languages whenever possible.

I suppose I agree. LISP is a safe language for server-side applications
if you take certain precautions:
- Collect all input (from HTTP/CGI transaction) into a string before
starting to process it further, so it's impossible for an error to
throw the LISP environent into a read-eval-print BREAK loop whereupon
it might then read some not-yet-read input and try to EVALuate it.
- Set up a special package which imports only the symbols you know for
sure are safe for your situation, re-defining any that would be unsafe
if imported as-is from the LISP package. Use this special package for
all your parsing of input.
- Disable all read-time evaluation when reading in the fields from the
HTML FORM contents.
- Traverse the result of safe-read-from-string to verify there aren't
any symbols outside that special package, i.e. to verify the user
didn't say otherpackage:symbolname somewhere in the form contents.

Did I leave out any necessary precautions?

> Date: Sun, 29 Feb 2004 17:12:53 +0100
(I didn't see your article when it first appeared, because I didn't
have any efficient method for finding all followups (to stuff I had
posted) until just a few nights ago when I finally found your article
and put it in the queue to compose a followup myself. Sorry for very
belated response.)



Relevant Pages

  • Re: Parse library
    ... special case the Value state code-path. ... The html doco on bind are available at ... My format might support something like: ... With a parser module, you'd ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Parse library
    ... My format might support something like: ... Note, that I am going to finally implement the full bind configuration format, ... With a parser module, you'd expect some ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Parse library
    ... this syntax is the config syntax used for Bind's config file ... > to comment the parser a bit better and throw it up as an article. ... > As you'll find out a compiler can go from one format to any other format, ... >> these tokens based on context. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: S-expression I/O in Ada
    ... the parser reads each character exactly once; ... Ada.IO_Exceptions.End_Error in the middle of an S-Expression, ... Expected atom with value 'host' ... converting it back to a character stream. ...
    (comp.lang.ada)
  • Re: Parse library
    ... "Justin Rogers" wrote in message ... > to comment the parser a bit better and throw it up as an article. ... > As you'll find out a compiler can go from one format to any other format, ... >> these tokens based on context. ...
    (microsoft.public.dotnet.languages.csharp)