Re: BNF notation needed
- From: wclodius@xxxxxxxx
- Date: 12 Apr 2006 16:37:07 -0700
Brooks Moses wrote:
<snip>
Finally, note the slightly ominous note at the bottom of this old post.
I'm not quite sure what the "tokenization problem" referred to is, but
it may be worth thinking about:
http://compilers.iecc.com/comparch/article/96-07-186
Programs are translated through several stages. The first stage,
lexing, recognizes the basic components of the language (Tokens) and
puts them into categories. These categories normally include language
identified names with special meanings (keywords), user defined names,
string literals, numeric literals, specific punctuation symbols, etc.
The parser uses a grammar that relies on the category associated with
each token to determine the overall structure of the program. You have
a tokenization problem if the natural categories generated by the lexer
do not fit the natural categories required by the parser and its
grammar. In the worst case the grammar can be completely ambiguous. In
many cases the grammar is not truly ambiguous, but the parser
(typically for efficiency reasons) is designed to only accept grammars
which fit into a simpler subset of all grammars. For most languages the
techniques to rewrite grammar into a form acceptable by the available
tools, is straightforward using well known rules.
The published nominal BNFs for Fortran are ambiguous and either cannot
be distinguished at all in a context free grammar, or can only be
distinguished by massive rewrites of the grammar. Few automated tools
deal well with context sensitive grammars. Problems relatively unique
to Fortran include
1. Lack of reserved keywords
2. Similarity of statement functions to assignment to array elements
3. Similarity of function calls and array element access
The first problem is normally (always) handled by a backtracking lexer.
Fortran has special rules on where punctuation (e.g.. "(", ")", "=",
and ",") can appear. There are variants, but the lexer might initially
assume at the start of a statement that it is an assignment. If the
punctuation does not fit this assumption, it then checks if the start
of the statement is a Fortran keyword, if it is the lexer marks it as
the appropriate keyword and continues processing, otherwise it
indicates that a lexical error has occurred, and either stops
processing or makes its best guess of a possible fix and continues
lexing, perhaps generating spurious error messages if its assumed fix
is incorrect..
The second and third problems can be handled by using context to
identify which names are known from earlier context to be arrays (which
have to be explicitly declared early in the procedure, before the
ambiguities can appear). This is easily done in a handwritten recursive
descent parser. In automatic parsers the ambiguity is normally not
resolved, but the grammar input to the parser instead uses something
like statement_function_or_array_name (instead of separately
distinguishing statement _function_name and array_name for ambiguity 2)
and function_or_array_name (instead of function_name or array_name for
ambiguity 3) which is then disambiguated by a simple second pass over
the resulting parse.
<snip>
.
- References:
- BNF notation needed
- From: GroThar
- Re: BNF notation needed
- From: Brooks Moses
- BNF notation needed
- Prev by Date: Re: can't find libg2c on FC4
- Next by Date: Re: BNF notation needed
- Previous by thread: Re: BNF notation needed
- Next by thread: Re: BNF notation needed
- Index(es):
Relevant Pages
|