Re: Script Parser



CJReilly@xxxxxxxxx wrote:
I am trying to build a simple script parser with basic-like syntax and
including the standard loops, variables, if/else statements, and such
found in most core languages.

I was hoping someone could point me to a good resource. I've tried
searching for books and articles online, but no luck. :-( Unfortunately
my comp. sci. class voted against the script parser lesson.


If you are doing this for the sake of enlightenment,
good; there is more enlightenment to be had out of
writing a basic interpreter than all the software
engineering books in the world.

If you are doing this out of necessity (as part of
a larger project, for example), then don't bother;
see Pascals response upthread.

I'm not really an expert at this but I'll show you
enough so that you'll have an idea of what needs
to be done and you'll know what to search for when
doing it. Don't worry about misleading inaccuracies,
the comp.programming folk are more than able to
correct me where I've gone wrong.

The first thing you must do is come up with a grammer
describing your language; google for "BNF tutorial".
BNF is simply a nice way to describe a grammer.

An example BNF for fractions would look somewhat
like this[1]:
1. S = '-' F | F
2. F = DL '/' DL
3. DL = D | D DL
4. D = '0' | '1' | '2' | '3' | '4' |
'5' | '6' | '7' | '8' | '9'

A rough translation into english of the above BNF is:
1. The Start set (S=START) is either a minus sign
followed by a fraction or a fraction. The "|"
character denotes "OR", so
'-' F | F
means
('-' F) OR (F).
2. A fraction is a DigitList (DL) followed by a '/'
followed by another DigitList.
3. A DigitList is either a Digit (D) or a Digit followed
by a DigitList.
4. A Digit is one of the characters '0' ... '9'.

The second thing you need to do is turn your
grammer from BNF into actual code - i.e. write the
code that can recognise and parse the tokens described
in BNF into a tree that can be used later.

The problem when parsing, you will find, is that
you must know which BNF rule to use to parse the
grammar compounded that the rules are recursive
(see Rule 3 above).

Luckily, these are easily solved (google for
"LL parsing" and "LR parsing"). You can use basically
any algorithm to transform input into a tree based
on rules in the BNF. One way is as follows:
1. Read in one character from input.
2. For each character,
a) Append character to a string representing
the current token.
b) If token satisfies any of the BNF rules,
then store token according to that rule
*and* clear the string representing token.
3. If no more input, then end, else goto 1.

(Of course, it's a little more complicated than that
so I advise learning LL and LR parsing).

Right, now that we have a way to read and store
input in an unambigous manner, we can attempt to
execute that input.

In 2.a above, you might have wondered where to store
the token. You store the token in a tree. For rule 2
in the BNF above, the tree should look something
like:
F
|
|
'/'
/ \
/ \
DL DL


After you've read in all the tokens and stored
them in a tree, you merely traverse the tree and
execute what each token specifies;
Example 1: expression evaluation.
input:
a = 5 * 4
tree:
...
/
/
a
|
|
*
/ \
/ \
5 4

You traverse the tre from the top, find "a", find "*"
then replace "*" with 5 * 4,
...
/
/
a
|
|
9
then replace "a" with 9:
...
/
/
9

All the other infix operators (+, -, /, >, <, etc)
will function the same.
-------------------------------------------
Example 2: The if statement.
input:
if a > b then
print "larger"
else
print "smaller"
Tree:
...
/
/
if
|
|
a > b
/ \
/ \
/ \
print "larger" print "smaller"

As you traverse down the tree, you encounter "if",
you evaluate "a > b" and traverse down the right branch
if the evaluation is true and down the left branch
if the evaluation is false.

You will do this for every construct (while, for,
etc) but bear in mind that each construct will be
represented in your grammer *first*, and the tree
must be built according to the rules of the grammer.

Whew :-) That was longer than I intended,

[1] Not exactly like this; I might be wrong on some
of the details so I urge you to find a proper
tutorial on BNF; it's been a long time since I
actually read the literature on any of this stuff
so I might be remembering the details slightly
wrong.

goose,
hth

.



Relevant Pages

  • Macro functionality at runtime?
    ... chose Antlr) and transforms your program tree, so for example, I hope ... Takes a set rules -- tokens to match to a syntax tree, ... (defmacro apply-rule (tokens sub-tree new-tree) ...
    (comp.lang.lisp)
  • Re: Code for converting forum pseudo-tags to HTML
    ... (defun preparse-post (text) ... "Find all tokens of the form in the text and construct a list ... "Produces a parse tree out of the text" ... (format-tree tree stream) ...
    (comp.lang.lisp)
  • Re: Code for converting forum pseudo-tags to HTML
    ... (defun preparse-post (text) ... "Find all tokens of the form in the text and construct a list ... "Produces a parse tree out of the text" ... (format-tree tree stream) ...
    (comp.lang.lisp)
  • Re: Code for converting forum pseudo-tags to HTML
    ... (defun preparse-post (text) ... "Find all tokens of the form in the text and construct a list ... "Produces a parse tree out of the text" ... (format-tree tree stream) ...
    (comp.lang.lisp)
  • Re: Script Parser
    ... Chris wrote: ... > would need a tree structure, and I'll probably end up using this ... The BNF is the first bit, ... infixOperatorExpr = var operator var ...
    (comp.programming)