Re: BNF notation needed




Brooks Moses wrote:
<snip>

Finally, note the slightly ominous note at the bottom of this old post.
I'm not quite sure what the "tokenization problem" referred to is, but
it may be worth thinking about:
http://compilers.iecc.com/comparch/article/96-07-186

Programs are translated through several stages. The first stage,
lexing, recognizes the basic components of the language (Tokens) and
puts them into categories. These categories normally include language
identified names with special meanings (keywords), user defined names,
string literals, numeric literals, specific punctuation symbols, etc.
The parser uses a grammar that relies on the category associated with
each token to determine the overall structure of the program. You have
a tokenization problem if the natural categories generated by the lexer
do not fit the natural categories required by the parser and its
grammar. In the worst case the grammar can be completely ambiguous. In
many cases the grammar is not truly ambiguous, but the parser
(typically for efficiency reasons) is designed to only accept grammars
which fit into a simpler subset of all grammars. For most languages the
techniques to rewrite grammar into a form acceptable by the available
tools, is straightforward using well known rules.

The published nominal BNFs for Fortran are ambiguous and either cannot
be distinguished at all in a context free grammar, or can only be
distinguished by massive rewrites of the grammar. Few automated tools
deal well with context sensitive grammars. Problems relatively unique
to Fortran include
1. Lack of reserved keywords
2. Similarity of statement functions to assignment to array elements
3. Similarity of function calls and array element access

The first problem is normally (always) handled by a backtracking lexer.
Fortran has special rules on where punctuation (e.g.. "(", ")", "=",
and ",") can appear. There are variants, but the lexer might initially
assume at the start of a statement that it is an assignment. If the
punctuation does not fit this assumption, it then checks if the start
of the statement is a Fortran keyword, if it is the lexer marks it as
the appropriate keyword and continues processing, otherwise it
indicates that a lexical error has occurred, and either stops
processing or makes its best guess of a possible fix and continues
lexing, perhaps generating spurious error messages if its assumed fix
is incorrect..

The second and third problems can be handled by using context to
identify which names are known from earlier context to be arrays (which
have to be explicitly declared early in the procedure, before the
ambiguities can appear). This is easily done in a handwritten recursive
descent parser. In automatic parsers the ambiguity is normally not
resolved, but the grammar input to the parser instead uses something
like statement_function_or_array_name (instead of separately
distinguishing statement _function_name and array_name for ambiguity 2)
and function_or_array_name (instead of function_name or array_name for
ambiguity 3) which is then disambiguated by a simple second pass over
the resulting parse.

<snip>

.



Relevant Pages

  • Re: Is pyparsing really a recursive descent parser?
    ... There are an enormous variety of parsing tools, ... For instance, the Earley parser ... Okay, in some contexts, an ambiguous grammar may be considered ... over the other through the context of the parse results? ...
    (comp.lang.python)
  • Re: Parsing ambiguities
    ... elegant way to generate parsers that can deal with ambiguity. ... Error Correcting Parser Combinators: ... If yours is "the same" as Parsec (no way ... prepositional phrases just plain aren't in the grammar ...
    (comp.lang.misc)
  • Re: Parsing fully context-free grammars
    ... This is one of the problems with GLR parsers. ... then you get a parser that is ambiguous also. ... This problem can be resolved at the grammar level and avoid the ... The problem is the ambiguity in the grammar, ...
    (comp.compilers)
  • Re: Why LL(1) Parsers do not support left recursion?
    ... languages are ambiguous and don't use deterministic language theory.) ... expressed using either left or right recursion in a grammar. ... the rest of what I said implying accepting ambiguity, ... I would write regular expressions if I wanted to emphasize the fact ...
    (comp.compilers)
  • Re: Is pyparsing really a recursive descent parser?
    ... The important part is what it recurses through... ... of that grammar meant. ... ambiguity doesn't have to do with recognizing the language. ... PyParsing is parser. ...
    (comp.lang.python)