Re: Handling error/status messages by interface to C++ programs
- From: "H. S. Lahman" <h.lahman@xxxxxxxxxxx>
- Date: Fri, 11 Sep 2009 17:09:52 GMT
Responding to Bieniasz...
This is interesting what you are saying, and I thank you for your
advices to my problem. But, as I am not so fluent in all these details
about how to design a compiler, please tell me where I can find
examples of the C++ code implementing the various concepts you are talking about.
The last time I worked on a compiler was in the late '80s and the language was BLISS. Consequently I haven't paid much attention to the literature (my copy of "Compilers" by Aho was the 1st edition; somebody borrowed it about a decade ago and I haven't seen it since). So I would not be much help for finding C++ code. B-( But I can provide the Executive Summary overview...
Most languages are LALRn (Look-Ahead Left, Right by N tokens). The means you can figure out what to do in the next parsing step by knowing what the previous, current, and next tokens are. So the simplistic procedural approach is essentially a switch statement where each case invokes a specialized procedure of the current token. That procedure will usually have some processing (e.g., saving context information about the current token as history or doing any work that can be done with the information in hand) and another switch statement for the next token. The last procedure for the statement terminal then "walks" the history values and does whatever work remains to be done to complete the statement. Thus one nests procedures linearly through the statement, token-by-token.
The more generic approach that I described is more of a design issue than a coding issue. The problem space is now compilation itself and one abstracts objects that reflect the invariants of that space. So when textbooks like Aho talk about syntax tables and use terminology like 'terminal' or 'production', one abstracts those things as objects. They then have fairly obvious knowledge attributes like keyword identifier text strings. They also have behavior responsibilities for the things a compiler would do with the entity like executing a particular BNF (Backus-Naur Form, the most common specification for LALR languages) production.
For such a generic compiler, one needs to instantiate the objects and their relationships. Typically that would be done by reading a BNF specification for the target language and invoking a bunch of "factory" objects. Once the objects are properly initialized, one kicks off the processing by "walking" the actual syntax token-by-token. Instead of a switch statement, though, the messages are addressed by relationship navigation and the specific relationship is instantiated based on the history already processed as one does the "walking".
The "walking" is usually over the input tokens. Those are mapped into syntax rows defined by and instantiated from the BNF. The actual processing is done by "productions" associated with particular elements of the particular syntax rows.
The same decisions about where one is in the statement and what the next token need to be made. However, since relationship instantiation is encapsulated separately from collaborations, those decisions can be made elsewhere from the production processing for the current token. For example, suppose we have the following <contrived> syntax
OPEN := FILE {compound name} {alpha name} // directory path, file name
:= MENU {alpha name} // menu ID
and suppose we <simplistically> have modeled the following fragment:
[Token]
+ identifier
A
| R1
+-----------+----------+----...
| | |
[Keyword] [alpha name] [compound name]
| 1
| starts with
|
| R2
|
| *
[Syntax Row]
The BNF specification will define and instantiate the [Syntax Row] and [Token] objects. There are potentially a combinatorial number of relationships among the fundamental token types for the various syntax rows. However, within each syntax row there is exactly one set of relationships. So for the row OPEN := FILE we might instantiate two instances of [Keyword], one of compound name, and one of alpha name. Then we daisy chain them with just three simple 1:1 relationships.
The tricky part is when we do that initialization. We can't simply instantiate them when the language BNF specification is read because we won't know what the identifier values are until we actually read the input language statements. The advantage the OO approach has is that we can make the instantiation as we "walk" the actual tokens but the decisions can be buried in the syntax table itself. Thus the [Syntax Row] attributes might be {{Row ID = Token1 type, Token 2 type}, ordered token type list}. IOW, instead of a run time switch logic we hard-wire the specific row connections from the BNF when a [Syntax Row] is instantiated.
Then when we "walk" the actual input tokens we also "walk" the ordered list of token types in the Syntax Row in lockstep to instantiate the right [Token] subclass using the identity from the input token and to instantiate a relationship. Then when the production doing this is done, it navigates that relationship by sending a message to that new Token object. That message essentially says, "I'm done doing my processing so its time for you -- whoever you are -- to do yours."
But usually an object like [Token] would be just a data holder that conveniently holds the BNF specification of the language (via its token type and relationships) and the input data values of the actual text being parsed. The compiler theory usually assumes that all the dynamic heavy lifting will be done by 'production' objects. So we might have another generalization that might look something like:
[Production]
A
|
+-------------+-------------+--------------+-----...
| | | |
[Keyword] [Directory path] [Menu Name] [File Name]
This essentially is just a GoF Strategy pattern that does the right thing for each token in the syntax row. So when we instantiate each token in the row we also need to instantiate a relationship to the right production. Thus {alpha name} is linked to the [File Name] production in the OPEN := FILE row while {alpha name} is linked to [Menu Name] in the OPEN := MENU row. Then the messages I mentioned above move between the [Production] objects, navigating through the associated [Token] objects to get to the right one.
One can cleverly handle that through the [Syntax Row] data structure as well because that mapping exists in the BNF specification since it doesn't change for a particular syntax row. That makes the syntax row object a little more complicated but it saves more IF statements at run time. Essentially we have cast the BNF specification into attributes of the [Syntax Row] to express statically some complex decisions about where to go next in parametric data. Those same decisions were done dynamically with switch statements in a procedural approach.
For the time being, I feel convinced that designing some sort of polymorphic messenger classes for displaying the various messages
from my "compiler", in an interface-dependent way,
is a reasonable idea, and that using intensively
exceptions for managing the error detection is rather a bad idea.
However, I have to say I have seen a simple example of an object-oriented parser design in some book (unfortunately I forgot
the title; the author had a Polish-sounding name), where exceptions were
used and recommended as an elegant solution to the reaction to error detection in the source text.
Alas, there is a lot of bad advice running around. I still see lots of people routinely using downcasts, behavior methods that return values, ubiquitous generalization, and other bad practices.
<Hot Button>
I think the problem lies in the fact that in the early '90s a lot of people converted to the OO paradigm from procedural development and they did so by going directly to OOPL programming. So they were desperately looking for something familiar in OOP. That caused them to overlay a procedural design techniques on the OOPL code. That was easy to do since the OOPLs are still 3GLs and necessarily use procedural block structuring, stack-based scope, and procedural message passing. The result was a lot of C and FORTRAN programs with strong typing that were nearly as unmaintainable as they would have been with without using an OOPL. The problem is that those guys are now writing OO books.
Procedural and functional paradigms are very intuitive in a computational environment because they closely map into the hardware computational models of Turing, von Neumann, et al. But the OO paradigm is focused on the problem space rather than the computing space. Consequently when properly done the OO paradigm is not intuitive from a pure hardware computation perspective. So to do the paradigm properly one *must* learn about proper OOA/D first. One doesn't need to use UML bubbles & arrows to do good OOA/D, but one mentally needs a good OOA/D vision before attempting OOP.
So my advice to anyone just starting out in OO development is to get an OOA/D book by one of the classic OO authors (Jacobson, Wirfs-Brock, Booch, et al) who was actively writing in the late '80s. Otherwise one risks getting "Structured Design for OOPLs".
</Hot Button>
In addition, I cannot completely avoid using exceptions in my
program. Without going too much into details, I can say that in order
to perform various partial tasks of my compiler, the compiler needs
to perform some kinds of numerical calculations on rational numbers
for which I am going to have a dedicated class. As the rational arithmetic operations may sporadically end up with problems such as "overflow", in such cases exceptions have to be thrown.
So, if my compiler cannot perform the above calculations,
it has to catch the exceptions and signal such operational errors
to the user.
I would argue that, as described, such errors are implicit in the subject matter. In addition, you know you need to report them. Therefore they are expected during normal processing (albeit hopefully rarely). So I think you should handle them through some sort of normal flow of control.
Note that detecting them is ultimately done externally by the hardware as a hardware interrupt. That is subsequently abstracted by the OS into a software interrupt to your program. You have no control over that so the only way your program can "see" the error is by using the language facilities for exceptions.
However, that detection mechanism is quite different than relying on exception processing to manage flow of control for recovery in your application. So I would limit the use of exception processing facilities to simply detecting such errors within the local scope where they occur. Use your application's normal flow of control mechanisms for backing out of local scope, reporting, and other recovery processing.
[BTW, I think a lot of people would argue that any hardware interrupt indicates a poorly formed software application and should be handled by exceptions because it should never happen and, therefore, is never expected. However, if the problem arises due to something like bad manual input (e.g., a value outside expected computational range), then the overflow is simply the manifestation of a problem that would be implicitly expected whenever there is manual data entry. The counter, of course, is that one should have checked the range when the value was entered rather than when the computation was performed. One can go around and around on such examples. So my position is that I don't think one can rule out that something like an overflow *might* be an expected error, especially in complex numeric processing. OTOH, I think the situations where it is not due to a poorly formed application would be very rare indeed.]
One thing that I am not sure of, though, is whether in the case of
signalling errors by exceptions, it would be possible to design
a compiler which detects more than one error in a single pass?
That's definitely a Maybe. B-)
To process multiple errors in a single pass you need to ensure that an error will not prevent correct processing subsequently (i.e., when normal processing continues the application is in a defined and stable state that will enable correct processing subsequently). That will almost always require some sort of recovery action. Whether that recovery can do that for _all possible_ subsequent processing depends on how much trouble you are willing to go to, what kind of error occurs, and what sort of processing you are doing.
For example, when parsing statements it may not be feasible to recover to a state where you can correctly find more errors later in that statement. But you very likely could recover to a point where you can find errors in subsequent statements.
Similarly if the error results from the GUI resource memory overflowing, it is unlikely you can do anything about it except provide a graceful abort.
--
Life is the only flaw in an otherwise perfect nonexistence
-- Schopenhauer
H. S. Lahman
H.lahman@xxxxxxxxxxx
software blog: http://pathfinderpeople.blogs.com/hslahman/index.html
.
- References:
- Handling error/status messages by interface to C++ programs
- From: Leslaw Bieniasz
- Re: Handling error/status messages by interface to C++ programs
- From: H. S. Lahman
- Re: Handling error/status messages by interface to C++ programs
- From: Nick Keighley
- Re: Handling error/status messages by interface to C++ programs
- From: H. S. Lahman
- Re: Handling error/status messages by interface to C++ programs
- From: Leslaw Bieniasz
- Handling error/status messages by interface to C++ programs
- Prev by Date: Re: Handling error/status messages by interface to C++ programs
- Next by Date: Re: (www.topsellingnow.com) wholesale juicy couture and boss t-shirts
- Previous by thread: Re: Handling error/status messages by interface to C++ programs
- Next by thread: Re: Handling error/status messages by interface to C++ programs
- Index(es):
Loading