Re: parser



On 9 Sep., 21:15, "Bill Cunningham" <nos...@xxxxxxxxx> wrote:
What would be an example of a very simple parser? I have read about
parsers and that they come in 2 steps lexical and syntactic. Basically how
are they designed is my question; in a tree design.

Having written scanners (for lexical analysis) and parsers myself
several times for different purposes (including for the interpreter
of the Seed7 language) I think I have some suggestions. I did not
read all answers to your question. Please excuse me if I repeat the
obvious.

If you have the choice to determine the syntax to be parsed, I
suggest you choose a LL(1) one (for the lexical symbols and for the
syntax of the whole language). LL(1) means that all decisions can
be made based on the current symbol (or character for lexical
analysis). Pascal was designed to use a syntax which is parseable
with LL(1) methods. This design decision made Pascal easy to parse
and Pascal compilers slim. So maybe you take a look at the syntax of
Pascal (the one defined by Wirth. Some of the Borland extensions are
not so easy to parse). Other languages like FORTRAN, COBOL, C, C++
and Java were not designed with a concept how to parse them in mind.

A LL(1) lexical scanner is a function like 'getSymbol' which allows
to get the (character) input symbol by symbol. The scanner needs two
things to get its input:

1. A variable holding current character (often called just 'ch',
in the examples below I call it inFile.bufferChar).
2. A function to read the next character and to store it
in the current character variable (often called nextch, or
getchar, in the examples below
inFile.bufferChar := getc(inFile);
is used as the next character function).

By inspecting the current character the scanner does it's decisions
like:

while inFile.bufferChar in white_space_char do
inFile.bufferChar := getc(inFile);
end while;

As you can see character classifications such as white_space_char
are used to specify which characters the scanner should skip. If the
symbols are choosed wisely the scanner can also make other decisions
easily:

case inFile.bufferChar of
when {'a' ..'z'} | {'A' .. 'Z'}:
... read identifier
when {'+', '-', '*', '/'}:
... read operator symbol
when {'(', ')', '[', ']', '{', '}'}:
... read parenthesis
when {'0' .. '9'}:
... read numeric literal
when {'''}:
... read character literal
when {'"'}:
... read string literal
when {'#'}:
... skip comment and then read the next symbol
end case;

After the scanner has read a symbol (identifier, operator symbol,
parenthesis, or literal) the inFile.bufferChar (variable holding the
current character) should always contain the next character after
the symbol (which is possibly the first character of the next
symbol). The details of the scanned symbol are holded in some
variables (Pascal compilers even use global variables for this).
Usually you will have a symbol category (enumeration with values
like BEGIN, END, FUNCTION, IDENTIFIER, OPERATOR, PAREN,
NUMERIC_LITERAL, CHAR_LITERAL, STRING_LITERAL, END_OF_FILE). The
scanner can also have the job to recognize reserved words like
'begin' and to use the symbol category BEGIN to describe it. The
symbol category END_OF_FILE is used to signal EOF. Other variables
hold numeric character and string literal values.

I wrote also something about scanning functions here:
http://seed7.sourceforge.net/manual/file.htm#Scanning_a_file

The source code of some simple scanner functions is here:
http://seed7.sourceforge.net/prg/scanfile.htm

Since it is sometimes helpful to scan directly from a string
(without interfacing the string as file) there are also
scanner functions defined for strings:
http://seed7.sourceforge.net/prg/scanstri.htm

A recursive descend parser works similar as a scanner. It consists
of functions like 'ifstatement', 'whilestatement', 'expression',
'term' This functions use the current symbol to do the decisions
and call the scanner (getSymbol) to advance to the next symbol.
When a parser function is called it assumes that the first symbol
to be processed is already in the current symbol variable. When
a parser function is left the first symbol after the construct
processed is already in the current symbol variable. A parser
function to process a Pascal while statement is:

const proc: whilestatement is func
begin
insymbol;
expression;
if symbol.category = DOSY then
insymbol;
else
error(54);
end if;
statement(fsys);
end func; (* whilestatement *)

The parser function 'statement' (which has the job of parsing Pascal
statements) may contain the following case statement:

case symbol.category of
when {IDENT}:
insymbol;
if symbol.category = BECOMES then
assignment;
else
call;
end if;
when {BEGINSY}:
compoundstatement;
when {GOTOSY}:
gotostatement;
when {IFSY}:
ifstatement;
when {CASESY}:
casestatement;
when {WHILESY}:
whilestatement;
when {REPEATSY}:
repeatstatement;
when {FORSY}:
forstatement;
when {WITHSY}:
withstatement;
end case;
if symbol.category = SEMICOLON then
insymbol;
elsif symbol.category not in {ENDSY, ELSESY, UNTILSY} then
error(6);
end if;

If you think that LL(1) parsing is not powerful enough I can tell
you that Seed7 is processed with a LL(1) scanner and a table driven
recursive descend LL(1) parser.

While writing scanners and parsers is an interesting task, there
are also other possibilities. If you want to parse some programming
language you have also the possibility to hook at an existing
programming language. Seed7 provides several mechanisms to extend
the language syntactically and semantically. So you might take a
look at the following examples:

http://seed7.sourceforge.net/examples/declstat.htm
http://seed7.sourceforge.net/examples/for_decl.htm
http://seed7.sourceforge.net/examples/operator.htm

Seed7 provides also an interface to get the list/tree representation
of the parsed program. This way function calls are represented as
lists (of type ref_list) where the first element is the function
and the other list elements are parameters. There are several types
(category, reference, ref_list, program) which are used to access
this representation of a program. This types are described here:

http://seed7.sourceforge.net/manual/types.htm#category
http://seed7.sourceforge.net/manual/types.htm#reference
http://seed7.sourceforge.net/manual/types.htm#ref_list
http://seed7.sourceforge.net/manual/types.htm#program

Greetings Thomas Mertes

Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
.



Relevant Pages

  • Re: Has anyone hand-written a scanner/parser module?
    ... tools to create scanner/parser modules for their compiler projects. ... was wondering if anyone has developed a scanner or parser that they ... personally hand-wrote? ...
    (comp.compilers)
  • Re: Compiler construction projects
    ... I prefer to call it GNU 3DLDF in order to draw attention to the GNU Project ... I call it machine language programming in Pascal. ... I actually implemented the scanner and parser for 3DLDF purely on the basis ...
    (comp.compilers)
  • Re: Has anyone hand-written a scanner/parser module?
    ... personally hand-wrote? ... The scanner in GNU Awk is hand-written, for a yacc parser. ... and if the identifier isn't found in the table, ...
    (comp.compilers)
  • Problem with flex, parsing a large file
    ... as the scanner does not use REJECT explicitly. ... After that, parsed again the large file, and the parser seems to get ... The resulting program was able to parse past ... Darwin david-deharbes-ibook-g4.local 8.3.0 Darwin Kernel Version ...
    (comp.compilers)
  • Re: Stars Supernova Genesis Beta
    ... > But consider a game where you can have only one scanner per start ... > properties of the scenario design / game setup. ... powerful scanner could be the planetary scanner (or ... > Drawing a zone and saying that being in / entering the zone triggers ...
    (rec.games.computer.stars)