Re: Fortran 77 parser
- From: "James Giles" <jamesgiles@xxxxxxxxxxxxxxxx>
- Date: Fri, 04 Apr 2008 19:11:52 GMT
glen herrmannsfeldt wrote:
Jon Harrop wrote:
(someone wrote)
Does it parse the following?
integerinteger,end
Some of the harder ones I remember have to do with DO statements
and FORMAT statements. Because of the problems with FORMAT
statements, WATFIV made the string 'FORMAT(' at the beginning
of a statement reserved.
DO10I1=1,3
10 FORMAT(I1)=3
DO 10 I1 = 1.3
Once again I'm surprised that Sale's algorithm isn't being used (and
that no one seems to know that these things are always handled
correctly by it). There are a few more cases in F90 and
beyond, but a suitable and fairly simple modification to Sale's
algorithm handles those too.
So, defining a simple concept: "Open context" means a symbol
that's not within parentheses and not in a string literal, hollerith
constant, or comment.
As I recall, 1) if a statement contains an open context equal sign (=),
the statement must be either an assignment statement, a DO statement,
or an IF statement. (Again, there are more cases in F90 and beyond.)
2) If a statement contains an open context comma(,), then it must begin
with a keyword. 3) If a statement contains an open context letter or
digit that immediately follows a close parenthesis, that statement
must begin with a keyword. 4) All other statements must begin
with a keyword.
Based on these rules (with only a little moderate effort) you can
identify the location of *all* the keywords in the source code.
So, even before lexical scanning, you can replace those keywords
with an appropriate special token that identifies the keyword.
Most of the keywords appear at the start of the statement. Most
of the rest appear (based on rule (3) above) immediately after
that close parenthesis in an IF statement.
Given
DO10I1=1,3
There is an open context equal and an open context comma. The
statement must begin with a keyword. DO is identified as that
keyword. No other keywords can be present in the statement.
10 FORMAT(I1)=3
There is an open context equal, but rules (2) and (3) don't apply.
By rule (1) the statement is an assignment. No keywords can be
present. Notice this is entirely independent of whether the statement
had a label. Sure, it's not permitted for a FORMAT not to be
labelled. But it *is* permitted for an assignment to be labelled.
So the choice can't depend on whether a label is present.
DO 10 I1 = 1.3
There is an open context equal. There are no commas or parens.
This is an assignment. No keywords are allowed. So, the compiler
doesn't even look for one.
Let's take the hardest of a previous article in the thread:
if(if)read(*,*)print,integer
No equal sign - so we already know the statement begins with
keyword. Given that it's an IF, the first place where you have an
open context letter preceeded by a close paren must also be a
keyword (READ in this case). There are no other keywords,
the algorithm doesn't even bother to look. Subtle yes, but not
excessively so.
The first published version of Sale's algorithm didn't replace the
keywords, it merely identified statement type (including the
"substatements" within IFs). It was about 500 lines of code
and was written in F66. Given that much of the code was more
complicated than necessary because it had to do character
manipulation with hollerith (he was, after all, helping people
that didn't yet have F77 prepare tools to support it). The proper
place to do this analysis is at the same time you remove comments
and stitch together continuations.
There is one other case I remember off the top of my head (and
I think there's yet another I don't remember, I'd have to read through
the existing lierature on the problem again). This case can't be
handled entirely by Sale's algorithm alone. Assuming that you are
going to allow identifiers longer that six characters, what do you
make of these:
INTEGER FUNCTION F(N)
or
INTEGER FUNCTION F(5)
Well, there is no equal sign in either, so the first token is the
INTEGER keyword. But what to make of the rest? If it's not
the first statement of a program unit, the next item *can't* be
a keyword (and you don't bother to look). If it is the first
statement of a program unit, the next token is FUNCTION for
the first of the above, but it's still just the identifier FUNCTIONF
in the second case. The decision depends on the syntactic
category of the things between the parentheses. You just have
to bite the bullet and special case the first line of program
units (only if the first non-blank characters are a typename
keyword followed by FUNCTION). Of course, if you don't
allow identifiers longer than six characters, the second
token must be a keyword, it's too long to be an identifier!
All this stuff is from memory, so I may have made some minor
errors somewhere. If I were to do it for real I'd have to review
my old notes. But getting this stuff right is about this level
of complexity. If Fortran really *was* hard to process correctly,
then how did they do it with computers we wouldn't consider
adequate to run a toaster oven?
--
J. Giles
"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare
.
- Follow-Ups:
- Re: Fortran 77 parser
- From: Greg Lindahl
- Re: Fortran 77 parser
- From: James Van Buskirk
- Re: Fortran 77 parser
- From: Richard Maine
- Re: Fortran 77 parser
- References:
- Re: Fortran 77 parser
- From: Jon Harrop
- Re: Fortran 77 parser
- From: Tobias Burnus
- Re: Fortran 77 parser
- From: Jon Harrop
- Re: Fortran 77 parser
- From: glen herrmannsfeldt
- Re: Fortran 77 parser
- Prev by Date: Re: The structure contains one or more misaligned fields
- Next by Date: Re: Zero-size arrays
- Previous by thread: Re: Fortran 77 parser
- Next by thread: Re: Fortran 77 parser
- Index(es):
Relevant Pages
|
|