Re: Interpreted Basic control-flow analysis

Arthur J. O'Dwyer wrote:
[multi-posted to moderated comp.compilers, unmoderated comp.programming]

I have an apparently non-trivial problem in control-flow analysis.
I'm trying to make a static control-flow analyzer for programs written
in TI-83 Basic, an interpreted language. The input is a Basic program,
line by line; and the output is a mapping from lines to lists of their
successors. (See example below.)

The important control structures supported by TI-83 Basic are
If-Then-Else-End, While-End, and Goto-Label.

At runtime, the interpreter maintains a stack of all the currently
enclosing control structures. A new entry is pushed when encountering
If-Then or While, and popped when encountering End. Notice that Goto
does not affect the state of the control stack --- this is the crux
of the problem! The runtime nesting of control structures isn't
necessarily reflected by a lexical nesting.

The successors of If-Then, Else, While, and Goto are always
lexically determined[1] and thus easy to compute. The successors of
"End" are what I'm having trouble with.

Consider the following trivial example, in which the lexical nesting
does reflect the runtime semantics:

1. If X successors are 2
2. Then " 3, 8
3. While Y " 4, 6
4. A " 5
5. End " 4, 6
6. B " 7
7. End " 8
8. C " none

Now consider the following example, in which the lexical nesting
doesn't quite match the runtime behavior:

1. While X " 2, 4
2. Goto L " 6
3. End " unreached
4. While Y " 5, 7
5. Lbl L " 6
6. End " 2, 5

The "End" in line 6 corresponds to "While X" if it was reached
via line 2 ("Goto L"), but it corresponds to "While Y" if it was
reached via line 4 (which is where control ends up if X is false
the first time line 1 is reached).

I'm looking for an algorithm that can handle this kind of language.
I want the resulting successor list to be as conservative as possible
--- e.g., it is not acceptable to give every line in the program as
the successor list of every End. (I don't care what's given for
unreachable lines like line 3 above, though.)

I've thought of recursively simulating a run through the program,
taking both branches at each decision point, and keeping control
stacks for each thread so I always know where each "End" goes. But
that needs a way to detect and avoid infinite loops. The only way
I've thought of is to avoid duplicating any (line number, control
stack) state; but that involves keeping arbitrarily large amounts
of data, and still doesn't deal with the following program, which
keeps expanding the control stack forever since it never closes any
of its "While" loops.

1. Lbl L " 2
2. While X " 3
3. Goto L " 2

Any ideas?

(Possibly useful notes: I'm writing the analyzer in C. Parsing
lines to see whether they're "If..." or "While..." is easy; parsing
expressions is harder than usual, and I'd prefer not to do it at
all, for the time being. TI-83 Basic also has control structures
that let you skip exactly one line, or "hide" a line from the
lexical thingie[1] but have it run at runtime. I don't think they're
relevant to the problem, but I'll explain more if asked.)


[1] - The two [1]s refer to the same thingie.

Unfortunately this problem is undecidable. You can't write a program
that will give the right successors in all cases. But you can get a
useful approximation.

Your idea of interpreting the program is excellent, except that you
probably shouldn't worry about values at all. Instead, trace all
possible execution paths, e.g. with a breadth first search of the
control graph while updating the (initially empty) successor set for
each statement. "Then" and "While" statements each take two possible
branches (simultaneously). When a successor update doesn't change the
set, cut that branch of the search. The seach must stop because the
sets have a finite maximum size--the number of lines.

The problem as you say is the successors of "End." You can make a big
approximation by saying the first statement in every While body
succeeds every End.. Probably not good enough. You can do better by
making your program prove a theorem. One approach is an analysis
pre-pass to determine which While statements can "reach" each End.
This will be roughly similar to "reaching definitions," which is a
standard compiler dataflow algorithm. It's just interpreting the
program in a different way, attaching sets of reaching Whiles to each
End. Check Aho, Sethi, and Ulman, Principles of Compiler Design.