Re: Fortran 77 parser
- From: *** Hendrickson <***.hendrickson@xxxxxxx>
- Date: Fri, 04 Apr 2008 22:40:34 GMT
James Giles wrote:
Richard Maine wrote:James Giles <jamesgiles@xxxxxxxxxxxxxxxx> wrote:
So, defining a simple concept: "Open context" means a symbolThe Hollerith part is a pain because you can't readily tell whether
that's not within parentheses and not in a string literal, hollerith
constant, or comment.
something is a Hollerith without parsing the statement, which is a bit
of a circularity issue. Plus the possibility that a quote or parens
might be part of a Hollerith makes the rest of it harder in the
presence of Hollerith also.
This is overstatement. To be sure, you do need to know a small
subset of the syntax rules of Fortran to find hollerith. But it's a
very small subset.
Yes, but. FORTRAN 77 code was rife with extensions. I generally agree
with you that Sales algorithm is a good thing. But, it needs to
be aware of things like
REAL*4 HENRY
when it strips out Hollerith. For good or bad, it's a provable
fact that "working" parsers tripped over that one. :(
Similarly, things like
DOUBLE PRECISION NAME
were potentially ambiguous with compilers that allowed DOUBLE
as a keyword and also allowed names longer than 6 characters.
It's OK to say that the parser should only accept standard conforming
code. But, if you're trying to develop a commercially useful
parser, you need to worry about the non-standard stuff.
*** Hendrickson
*** Hendrickson
.
Going left to right, if you're already in a comment, you don't need
to look for hollerith (or string literals, or Sale's algorithm info).
If you're not in a comment, then whichever you encounter first
of the following dominates: 1) a string literal is denoted by the
appearance of a quote or an apostrophe and runs until you find
a corresponding quote or apostrophe; 2) if you find a digit (or
a sequence thereof) following something that's not a digit or
letter (disregarding spaces of course) that's followed by the
letter H (or h, if you're adventurous enought to allow your F77
parser to process lower case) then you are in a hollerith.
In a string literal you don't even bother to look for anything other
than the delimiter that began it. In a hollerith constant you just skip
the specified number of characters without looking at them in any
way at all. Once you reach the end of either, you continue
searching for the next one (and for patterns I described as
part of Sale's algorithm). These rules apply even if you decide
to allow the extension of trailing comments (!). You just have
to scan for the exclamation point as well. (You have to get
rid of those to pack together the continuation lines anyway.)
You make these decisions independent of any other context since
you can prove that either these are valid choices or the statement
you are processing is in error anyway. Further, it can be shown
that if the statement you are processing *is* in error, using these
rules won't mistakenly make you believe there's no error. (Yeah,
I researched that last point very carefully too.)
[...] I think there are some cases of
extensions in Format statements that make things even nastier,
perhaps even ambiguous. I seem to recall things like making some
commas optional having effects like that.
Not optional commas (at least those don't confuse Sale's algorithm).
But the CDC extension of using asterisks to delimit Format string
specifiers - coupled with optional commas - defeats Sale's algorithm
very effectively:
10 FORMAT(I1*ABC) = DEF(XYZ*I2)
Is this a format statement containing the format specifiers I1,
*ABC) = DEF(XYZ*, and I2 or is it an assignment statement?
If the commas weren't optional the issue might be simple:
a comma (or an open paren) followed by an asterisk begins
a format string specification and an asterisk followed by a
comma (or a close paren) terminates such. That rule is
unambiguous and only makes mistakes when the statement
being processed is in error anyway.
As it is CDC just cheated - any labelled statement whose first
seven non-blank characters were FORMAT( was automatically
considered a format statement.
(For those that don't know, the CDC character set in those days
didn't have either quote (") or apostrophe (') in it.)
- Follow-Ups:
- Re: Fortran 77 parser
- From: James Giles
- Re: Fortran 77 parser
- References:
- Re: Fortran 77 parser
- From: Jon Harrop
- Re: Fortran 77 parser
- From: Tobias Burnus
- Re: Fortran 77 parser
- From: Jon Harrop
- Re: Fortran 77 parser
- From: glen herrmannsfeldt
- Re: Fortran 77 parser
- From: James Giles
- Re: Fortran 77 parser
- From: Richard Maine
- Re: Fortran 77 parser
- From: James Giles
- Re: Fortran 77 parser
- Prev by Date: Re: Zero-size arrays
- Next by Date: Re: Fortran 77 parser
- Previous by thread: Re: Fortran 77 parser
- Next by thread: Re: Fortran 77 parser
- Index(es):