Re: CYK & Context-Free Expressions



Rock Brentwood <markwh04@xxxxxxxxx> writes:

No, Chris. Parsing theory is generally stated as an adjunction of the
general theory of syntax-directed translation -- not as an application
of language recognition. All parsers in common use work as transducer
with SSDT's or SDT's (YACC being the archtypical example); few ever
produce trees as output, though they generally can. All standard
references relate parsing theory as an adjunct of the more general
theory of syntax directed translation (and parsers to push down
transducers or more general transduers), both as a special case of the
translation problem and as its essential core.

I think on this we may have to agree to disagree (and if so, I
may follow your remarks with that assumption, as it may clear up some
points that are unclear to me). However, I will respond first as to
why I disagree.

I do not dispute the notion of syntax directed translation. In fact,
that is why I tried to formulate a way of defining an output
(translation) that would capture the relevant portions of tree
construction, since any translator which could produce that output
would have effectively solved the derivation tree problem.

However, I am still more comfortable as seeing "parsing" as a
derivation tree construction problem, and think that doing so captures
some important aspects of the parsing problem.

First, some of the issues in the differing power of LL versus LALR
parsers are clearly visible when you look at the derivation trees. In
particular, the grammars that are LL but not LALR all have references
to empty (nullable) rules that are disambiguated by the token
following the reference and where different contexts use different
tokens that are not disjoint. In the LL case, one simply examines the
token following the rule as part of the follow set in that context,
and after looking at that token during the parse constructs the
appropriate empy derivation tree for that element. In the LALR case,
the different nullable rules get merged and the context is lost, so
when constructing the derivation tree for the nullable non-terminals
one sees the union of the contextual follow sets and generates an
reduce-reduce conflict. Thus, if you view parsing as constructing
derivation trees, this problem becomes clear, as does the distinction
between the LL and LALR parsing methodologies. Further, it helps
explain why general LR parsers do not have the same issue as LALR ones
do, as the merging does not occur in the LR algorithm and you have the
context to create the correct derivation tree at the point where it is
required.

Second, in practice, when building a compiler the output of the syntax
directed translation is a form of parse tree (generally refered to as
an abstract syntax tree or AST). Many of the compiler algorithms work
on expression trees (e.g. Sethi-Ullman numbering) or a related form
such as triples or quadruples. Moreover, this connection is now
widely observed and facilitated in modern parser generators,
e.g. ANTLR, Cocktail, JavaCC with either JTB or JJTREE, LRGEN, and
Yacc++ to name just a few. I cannot cite who first introduced the
concept, but it was clearly present in Steve Johnson's paper, "Yacc
meets C++" and the writings of Josef Groelsch. Even, if you look at
the parser generators which process attribute grammars, many of them
(i.e. the ones which can handle attributes that require more than one
pass) work by constructing a tree as part of the parsing process.
Also, if you look at the "parse forest" work of Lang, Tomita, et. al.,
this is another variation on constructing a parse tree.

One does not generally take it to be something that applies merely to
grammars. Equivalence transformations on grammars produce inequivalent
parsing problems.

Definitely, but to me therein lies the rub. If one changes the
grammar, one changes the derivation tree that it constructs, and if
one is using that derivation tree as part of one's AST construction,
one has changed the output of the parse. In fact, in practice this is
a big issue. One has to be careful when bending a grammar to fit the
limitations of ones parsing tool to make certain one doesn't change
the grammar such that it no longer captures the relevant problem
(i.e. builds the wrong tree). In addition, if the translation uses a
stack (and the stack has partial derivations), if you change the
grammar, you may change what the contents of the stack and what is
available to the output construction routines.

The equivalence is with SSDT's (and more generally,
with transductions) not with grammars. The algebraic properties/
representations and transformations (as made especially clear by the
algebraic formulation of the LR paradigm) are always in reference to
SSDT's, not CFG's [Note 1]. When you draw the confusion between
recognition and parsing or treat parsing as meaning anything less than
translation at its essence, you only run into trouble and the above-
mentioned cloud of vagueness ... particularly on the question of what
"transformations" or "equivalences" apply (and what underlying
algebra) when and where. Consequently, the distinction is usually
drawn between parsing and recognition, the former equated as an
essential core of the more general problem of translation that it is a
special case of.

I agree that one should not confuse parsing with recognition. I don't
think that is our issue.

In contrast, equivalence DOES apply to transductions. So do algebraic
methods, such as those described in the notes in the archive your
posts are in reference to.

I can accept this point. However, I'm not sure your equivalence
captures the information which I want preserved (see next).

The parsing problem is generally regarded in our field, computer
science, as an inextricable part of the more general translation
problem; the latter is not regarded as a usurpation of the former, but
as its logical conclusion. Finding a "structure" is NOT generally
regarded by anyone as the essence of the problem, but merely as one
means (though not the only one) of acquiring an understanding or
translation of the input.

To me that "understanding" has structure and it is important for the
parser to capture that structure. That is what distinguishes parsing
from recogntion. Recognition, says yes this is a sentence of the
language, parsing says this is a sentence of the language and here is
the derivation wich proves it. The derivation capturing the relevant
structure, i.e. which noun does the adjective modify, is the noun
subject or object, etc. If you don't need the structure, one can
generalize the well-known parsing methods to recognize even with
poorly formed grammars. If you need the structure, then parsing
becomes hard.

Recognition is not parsing, as you suggested.

I did not mean to suggest that recognition is parsing. In fact, I
would go the opposite direction. I suspect I think there are
transductions that are not parses, which is why you can achieve
parsing bounds that make no sense to me, because those parses do not
capture something essential to what I mean by the term.

The parsing problem pertains to the actual "structure" of the system
comprising the transduction -- which is what is essentially captured
by whatever SSDT is used (not the grammar).

I think we are closer to agreement here. You want the structure to be
part of the SDT. I think that might be acceptable.

However, if I understand your term SSDT (and perhaps I do not), it is
close to what I mean by a deterministic PDA, which means it cannot
handle the entire set of context-free-languages only the deterministic
ones, and it is not surprising to me that you claim one can solve any
deterministic CFL in (deterministic) linear time. It is just applying
that result to non-deterministic ones that is counter-intuitive. I am
not aware of an algorithm that can parse a non-deterministic CFL in
linear time. But, for this point, I will admit to not understanding
all of your work, which is why I am frustrated. If I understood your
notation better, I could work through and see where what you are
saying diverges from what I "know". Perhaps, there is an insight that
I had never been exposed to. If so, I'd like to know it.

Much has evolved over the years and various refinements have been
taken to the naive statement to indicate what the "parsing" problem is
actually referring to. This all hinges on what manner of
representation is adopted; the most general case (context-free
expressions) subsuming the translation problem in its entirety.

It is not clear to me that context-free expressions subsume the entire
translation problem (and the SSDT issue is important here again).
However, perhaps both of these are issues with my not understanding.

To the
degree the problem is clearly stated, and to the degree that
references make clear, the parsing problem is, indeed, therefore an
adjunct and inextricable part of the general problem of transduction.

Aside from the points mentioned above, I don't have as issue with this
conclussion.

Thank you for reading,
-Chris

******************************************************************************
Chris Clark Internet: christopher.f.clark@xxxxxxxxxxxxxxxxxxxxxx
Compiler Resources, Inc. or: compres@xxxxxxxxxxxxx
23 Bailey Rd Web Site: http://world.std.com/~compres
Berlin, MA 01503 voice: (508) 435-5016
USA fax: (978) 838-0263 (24 hours)
------------------------------------------------------------------------------

.


Quantcast