Re: A Proper Lexer



I second this. In most case there's no need for regular expression
when doing lexer job.

Months ago I wrote an ASN.1->Lisp compiler [1], the lexer part is
whole implemented by CL readtable. ASN.1 token is quite complicated
but CL readtable is powerful enough.

--binghe

[1] http://sourceforge.net/project/showfiles.php?group_id=209101&package_id=285435

Paul Tarvydas wrote:
akopa wrote:

Does any one have any advice on doing fast-ish character I/O for a Lex
like scanner. I've read http://www.emmett.ca/~sabetts/slurp.html,
but I want to present single chars to the scanner.

...

I was a little surprised to find no equivalent of Lex available for
Common Lisp. There is http://www.geocities.com/mparker762/clawk but
it is string oriented, not stream oriented.


There is little incentive to use Lex in lisp, because the lisp reader *is* a
lexer (tokenizer), hence, lisp is the language least likely to have a
Lex-like library. I've built many small languages and just used what lisp
gives me, e.g. if you can design it so that the little language has a space
between each token, you don't have to write a lexer. I suppose you could
get fancier by employing read macros, et al.

When I write languages that cannot follow the above simplification, I use my
home-grown CL versions of extended S/SL (syntax/semantic language) and my
experimental packrat parser (which describes the construction of tokens and
parses - google for "Bryan Ford Packrat Parser"). I can share these with
you, but they are not Lex-like (nor are they well-documented, but there is
always email :-).

By definition, lexing something a character at a time is, by definition, a
state machine. If you design it with a state machine macro - code executed
on entry, transition and exit - you will not tie yourself up in knots. I
have a state machine macro which I could share.

Lexing is often a simple enough job that you can just roll it by hand using
a few simple primitives (I think I can list them, if you're interested).
Again, the state machine paradigm is the way to go.

As for speed, what are you doing? I've stopped worrying about speed in
lexing and parsing - desktop machines are so fast these days that I seem to
get away with fully-backtracking parsers and a packrat parser with no
packratting and don't even notice.

And if you REALLY want Lex, then I know from direct experience that you can
code one up directly from the algorithms in the Dragon book (I did it about
30 years ago, but my version is lost, or on an 8.5" floppy :-).

It ain't exactly what you asked for, but contact me if any of the above is
of interest.

pt
.



Relevant Pages

  • Re: Making a Calculator program in LISP
    ... The first thing to understand about Lisp, ... The function READ will have done the work of constructing tokens for you, ... and no other clause considered. ... which is like F,4,5) in an algebraic language. ...
    (comp.lang.lisp)
  • Re: A Proper Lexer
    ... Common Lisp. ... e.g. if you can design it so that the little language has a space ... If you design it with a state machine macro - code executed ... get away with fully-backtracking parsers and a packrat parser with no ...
    (comp.lang.lisp)
  • Re: memory usage in cmucl
    ... transform that language into Lisp code to implement it. ... in the state machine to be executed. ...
    (comp.lang.lisp)
  • Re: silly: "spel" instead of "macro"
    ... In C/C++ and other languages with a typical pre-compiler, the tokens are predefined. ... While in C/C++ you are limited to replace expressions by patttern matching with others, you have the full Lisp language at hand to calculate the substitution, and what is even more powerfull, you have state. ...
    (comp.lang.lisp)
  • Re: ILC2005: McCarthy denounces Common Lisp, "Lisp", XML, and Rahul
    ... >> the language should be available to users. ... In the design of Common Lisp, I asked Dave Moon (one of the architects ... Now, there are good kinds of low-level, like the way that floats are ...
    (comp.lang.lisp)