Re: CYK & Context-Free Expressions
- From: Chris F Clark <cfc@xxxxxxxxxxxxxxxxxxxx>
- Date: Sat, 26 Apr 2008 12:14:10 -0400
Rock Brentwood <markwh04@xxxxxxxxx> writes:
LR-based methods (or anything else that gives you grammars (or parse
forests) as outputs) have you O(n^3) complexity for (the canonical
extensions of) Chomsky normal form grammars.
Ok, this matches the understanding that I have. This is an upper
bound, because we have algorithms that achieve this. I don't know
that this has been proven to be a lower bound.
A more direct representation in terms of context-free expressions of
the output set T[w] for a word w and translation T is easy to come up
with that is O(n) in the length of the word. This, of course, includes
the cases where T is the canonical top-down and bottom-up extensions
of a grammar, as special cases.
If T is a context-free subset of the monoid X* x Y*, where X and Y are
finite, then the context-free expression for T is simply
T = <0| (X' + Y' + H')* |0>
where
X' = sum_{x in X} (x <x|)
Y' = sum_{y in Y} (y <y|)
H' = sum_{q -> v1...vk} (|vk>...|v1><q|)
over the bra-ket algebra C_{x+y+q} where x = |X|, y = |Y| and q = |Q|,
Q being the non-terminals. This reduces to
T = <0| N (X' N)* |0>
where
N = (Y' + H')*.
For a given word w = x1 x2 ... xn, the corresponding parse structure
is simply the context-free expression
T[w] = <0| N <x1| N <x2| ... N <xn| N |0>.
The recognition problem is obtained directly from this under by monoid
homomorphism h: (Y union Q)* -> Q* that maps h: Y -> {1}. The
expression h(T[w]) becomes a pure bra-ket expression that reduces to 0
or 1. It reduces to 1 if and only if T[w] is not empty. The actual
process of reducing the bra-ket expression is closely linked to the
shortest paths algorithm and Boolean matrix multiplication. As best
known at present, this is O(n^log 7) in complexity.
Although I still have to analyze it in more detail, this looks to be a
suitable definition of the recognition problem for a specific language
class.
As you can see by the various refinements (and the mere need to have
such refinements), the current state of the art in parsing theory is
still at a primordal state rife with imprecision of teminology and
confusion, notwithstanding the passage of 30+ years since the time of
Knuth (also as exemplified by the person originally responding to
you). Much of what went into the supplements was, in fact, to clear up
this confusion and lay out the more precise terminology.
I think herein lies the rub. Laying out more precise terminology is
fine. However, if you change the meaning of common important words,
like "parsing" it is possible that you will have a hard time
communicating and that people who are earnestly trying will not be
able to understand you. This next paragraph gets to that point.
For instance, the misconception is still quite prevalent that parsing
is something that applies in reference to context-free grammars. It
does not. It applies to simple syntax directed translations (SSDT's)
and can only be applied to grammars, themselves, via an extension of
the grammar (e.g., the bottom-up, top-down canonical extensions
referred to above) to a SSDT. It is a somewhat specialized instance of
the translation problem; that of determining the set T[A] = { n in N:
(m,n) in T for some m in A } of translations for a subset A of a
monoid M to a monoid N.
The generally accepted term of "parsing" *is* used in connection with
grammars (all grammars and not just context-free ones) to indicate how
the grammar induces upon a text a derivation tree. Your ursurping of
this term for SSDT's is acceptable as long as the translation problem
you describe produces something equivalent to a derivation tree
(e.g. a series of marks in the tree indication which symbols in the
input text are associated with which parts of the tree). The creation
of a derivation tree is important, because it shows how the grammar
structures the sentence, which is what distinguishes parsing from
recognition. Moreover, it captures the essence of the problem people
want to solve with parsing, that is translating the input sentence
into a structured form.
So, one sees (even in the literature, such as it exists), a failure to
distinguish between CFGs and SSDT's; the notion that algebraic methods
can't be "applied" because "the result may destroy equivalency" (in
fact, the "equivalency" it destroys is where algebraic methods are
applied to a CFG since the canonical extensions are not preserved by
such transformations; but algebraic methods can certainly be applied
to the SSDT's that the CFG's represent); one sees a failure to
distinguish between the various aspects of what are (mistakenly)
lumped under the single header of "parsing" (that is: enumeration,
choice and emptiness test/recognition); etc.
Now, having read previous posts in this series, I see that you would
like to structure the problem as generating output symbols. That
should be possible. Simply put a unique output symbol at the start
and end of each expression, and also one after each terminal. Those
outputs will clearly indicate the structure of the derivation tree.
Having done that, you should be able to make your algebraic
transformations and still generate the output showing the original
derivation tree. If you do that, you will have captured the "common
notion of parsing" in a way that should free you of claims of
destroying equivalency, because you can recreate the standard result
by simply showing how to construct the derivation treee from your
output symbols (and it will be a linear process to do so). Moreover,
if you can show how to create a derivation tree, you can then create
all those other terms that you thing the other authors have muddled
together.
If you can show how to create a derivation tree using your model, then
you can call that process "parsing", because it will capture the
important elements of the parsing process. If you can't, then don't
be surprised when people find your results counter-intuitive, because
you are using a term that they already understand in a way that
doesn't conform to their understanding. The same way that refering to
automobiles as "bikes" and bicycles as "cars" would make it hard for
others to understand you. And, I assume by the effort you put into
your posts and writings that you would like to be understood.
Hope this helps,
-Chris
******************************************************************************
Chris Clark Internet: christopher.f.clark@xxxxxxxxxxxxxxxxxxxxxx
Compiler Resources, Inc. or: compres@xxxxxxxxxxxxx
23 Bailey Rd Web Site: http://world.std.com/~compres
Berlin, MA 01503 voice: (508) 435-5016
USA fax: (978) 838-0263 (24 hours)
------------------------------------------------------------------------------
.
- References:
- CYK & Context-Free Expressions
- From: Rock Brentwood
- CYK & Context-Free Expressions
- Prev by Date: Re: Exercise 2.1.2 from Aho&Ulman The theory of Parsing textbook?
- Next by Date: A question about a LL parser
- Previous by thread: CYK & Context-Free Expressions
- Next by thread: Re: CYK & Context-Free Expressions
- Index(es):
Relevant Pages
|