Re: CYK & Context-Free Expressions
- From: Rock Brentwood <markwh04@xxxxxxxxx>
- Date: Sat, 3 May 2008 09:17:37 -0700 (PDT)
No, Chris. Parsing theory is generally stated as an adjunction of the
general theory of syntax-directed translation -- not as an application
of language recognition. All parsers in common use work as transducer
with SSDT's or SDT's (YACC being the archtypical example); few ever
produce trees as output, though they generally can. All standard
references relate parsing theory as an adjunct of the more general
theory of syntax directed translation (and parsers to push down
transducers or more general transduers), both as a special case of the
translation problem and as its essential core.
One does not generally take it to be something that applies merely to
grammars. Equivalence transformations on grammars produce inequivalent
parsing problems. The equivalence is with SSDT's (and more generally,
with transductions) not with grammars. The algebraic properties/
representations and transformations (as made especially clear by the
algebraic formulation of the LR paradigm) are always in reference to
SSDT's, not CFG's [Note 1]. When you draw the confusion between
recognition and parsing or treat parsing as meaning anything less than
translation at its essence, you only run into trouble and the above-
mentioned cloud of vagueness ... particularly on the question of what
"transformations" or "equivalences" apply (and what underlying
algebra) when and where. Consequently, the distinction is usually
drawn between parsing and recognition, the former equated as an
essential core of the more general problem of translation that it is a
special case of.
[Note 1]:
Except that you may regard as SSDT, itself, as a context-free grammar
over the product monoid that the transduction is over.
In contrast, equivalence DOES apply to transductions. So do algebraic
methods, such as those described in the notes in the archive your
posts are in reference to.
The parsing problem is generally regarded in our field, computer
science, as an inextricable part of the more general translation
problem; the latter is not regarded as a usurpation of the former, but
as its logical conclusion. Finding a "structure" is NOT generally
regarded by anyone as the essence of the problem, but merely as one
means (though not the only one) of acquiring an understanding or
translation of the input. Parsing, even in common usage, means to take
it apart and put it back together so as to understand (i.e.
translation); and to interpret or re-render (e.g., Clinton parsing
with the word "is").
In well-written texts, you will see a clear distinction made between
recognition and parsing/translation; between grammars and
transductions with the latter being the domain of parsing theory.
Whether the translation is to trees or any of its linearized
equivalents or to something else and more general (as it usually is,
with most parsers) is immaterial. It's still translation that is the
essence of what's going on here.
Recognition is not parsing, as you suggested. In particular, it is not
O(n^2) (nor is anything like this a "standard proof" in the first
place to be "contradicted" by anyone). CYK is not O(n^3), except on
grammars whose right hand sides are at most quadratic. The ability to
transform a grammar to Chomsky normal form, however, is irrelevant
unless using CYK only for recognition. In that case, however, it and
dynamic programming are superseded by the shortest paths paradigm.
The parsing problem pertains to the actual "structure" of the system
comprising the transduction -- which is what is essentially captured
by whatever SSDT is used (not the grammar). Grammar equivalence does
not equate SSDT equivalence. It is the latter which parsing theory is
concerned with, not the former. So it makes a difference whether a
grammar has a maximum degree of k for expressions on the right-hand
side of its rules, since the SSDT associated with it is different than
the SSDT associated with the Chomsky normal form of the grammar. CYK
on such grammars will then be O(n^{k+1}). No upper bound to k exists,
likewise to CYK.
The problem you apparently posed ("apparent" because you're being
vague):
PARSE: "produce, one-by-one, to exhaustion all the outputs
corresponding to a given input" [be it tree-form, linearized, or
otherwise]
is, likewise, not well-defined and is therefore NOT generally regarded
as what parsing means, since there is no well-defined "what" here for
it to be of, in the first place.
There is, in fact, no "standard" understanding on what the "parsing"
in this sense actually refers to; particularly because of its ill-
definedness. One generally takes the problem to be that of producing
some sort of characterization of the set of all outputs corresponding
to a given input, whether the outputs be listed as parse trees,
forests, words sets of words or otherwise. The differences are
inessential between whether one regards a parse tree as a 2-
dimensional structure or a string written in one of its canonical
forms.
Invariably this forces the split of the so-called "produce trees to
exhaustion" formulation into either (a) an indexing (enumeration), (b)
a listing (parsing) or (c) an extraction (choice ... special case of
enumeration).
Much has evolved over the years and various refinements have been
taken to the naive statement to indicate what the "parsing" problem is
actually referring to. This all hinges on what manner of
representation is adopted; the most general case (context-free
expressions) subsuming the translation problem in its entirety. To the
degree the problem is clearly stated, and to the degree that
references make clear, the parsing problem is, indeed, therefore an
adjunct and inextricable part of the general problem of transduction.
Now … on your attitude:
And, I assume by the effort you put into your posts and writings that you would like to be understood.
You're way off the mark and way the out of line with all of this and
the rest of your post.
And you need to be a little bit more sparing with the haughtiness.
Remember: you're the outsider here, not the other way around.
Moreover, this is advice that would be more appropriate for you to
take for yourself. You would cause fewer needless misunderstandings if
you'd simply try to properly communicate (as opposed to, say, talking
and thinking like an engineer or software developer whose
communication (dis)abilities are rather well known); to be less vague,
yourself; and a little more sparing with calling things "standard"
where you apparently have little or no knowledge of what actually is
or not.
(e.g. a series of marks in the tree indication which symbols in the input text are associated with which parts of the tree).
And if you're not well-enough acquainted with the general subject to
know how one relates, say, the leftmost parse sequence of a tree with
a tree (as your last post seemed to clearly indicate), and are having
that much trouble understanding the relation and essential equvialence
between parse trees and any of its canonical linearized equivalent
representations, and having difficulty clearly understanding the
difference between recognition and transduction, then I don't think
you should be placing yourself in a position to be presuming to
speaking with authority on any of these matters, lecturing us in the
community as an outsider as if you were the inside and they on the
outside (when, in fact, it's more the other way around here). It
might, in fact, do you better off to simply listen for what they may
have to say on this or other matters and perhaps even learn a few
things from others who bave been around much longer than you have and
have much more experience.
The relevant question you probably meant is HOW one enumerates the
specific parse trees out of the set (unless, in fact, you actually
don't know that parse trees are equivalent to word sequences in the
above manners ... in which case the above comments apply). That
problem is no different than (and is, in effect, a special case) of
the problem of enumerating the elements captured by a regular
expression. In turn, this may be regarded as a special case of the
more general problem of query composition.
The enumeraiton of <0| a(a<1| + |1>b)* |0> in the example for cyclic
grammars (where the association are a: S -> 1, b: S -> SS) might
proceed, for instance, in breadth first fashion as one would do with a
regular expression: (1) convert the expression into a FA, (2) start at
the start state, (3) traverse in breadth-first fashion. All-in-all
such a process would be an extension of Tomita. That, of course, is
just one solution to the enumeration problem, the more obvious one.
None of the characterizations of what's "standard" or "prevalent" by
"people" (or by the nameless "they" in your frequent use of the
passive like "is regarded by" [Note 2]) is so. By your own admission,
you're already having difficulty understanding the various
distinctions (say) between parsing and recognition, between grammars
and transductions or made in the material you referenced (which, in
fact, relate to concepts that we generally regard as standard in the
computer science community).
[Note 2]: Passive voice. (Also: "people". If it's not in the list, it
should be.)
http://en.wikipedia.org/Weasel_Words.htm
You would probably won over a few more friends and converts by being a
little more sincere than resorting to "weasel words":
The [sic] generally accepted [by ???] term of "parsing" [sic] *is* used in connection with grammars (all grammars and not just context-free ones) to indicate how the grammar induces upon a text a derivation tree.(See [2] above. Corrections added)
... people [that is: Chris] who are earnestly trying ...
... If you can't, then don't be surprised when people ["people", meaning: Chris and ???]
... if you try to remain relevant to what you're discussing:
However, if you change the meaning of common important words, like "parsing" it is possible(None of this has anything to do with anything you're replying to, and
that you will have a hard time communicating and that people who are earnestly trying will not be
able to understand you.
.. Your [sic] ursurping of this term
... The same way that referring to automobiles as "bikes" and bicycles as "cars", ...
is therefore irrelevant)
The creation of a derivation tree is important, because it shows how the grammar structures the(None of which is relevant since it has nothing to say for or against
sentence, which is what distinguishes parsing from recognition. Moreover, it captures the
essence of the problem people want to solve with parsing, that is translating the input sentence into
a structured form.
anything you're replying to)
... and you would probably win over a few more friends and converts if
you'd lose the haughty attitude. You haven't earned the right to it
here, certainly not with me:
The same way that referring to automobiles as "bikes" and bicycles as "cars" would make it(The correct analogy here is: "The same way that referring to
hard for others to understand you.
automobiles as "four-wheeled motorized transport vehicles" and
bicycles as "two-wheeled muscle-powered transport vehicles", as
opposed to "cars" and "bikes" could initially make it harder for
others (meaning, an outsider who runs a computer/software engineering
firm) to understand what's standard in the field, though it more
clearly captures the essence of both their commonality, and
distinction; as well as their function and purpose in a way that
transcends the paritculars of what they are or how they are
constructed, while also showing relevant and important
generalizations, such as transmission-amplified two-legged, zero-
wheeled transport vehicles, so that we can break out of the
increasingly obsolete mode of thought of the 20th century...")
Laying out more precise terminology is fine.(The correct statement is: "laying out the more precise terminology
that is in common usage and more standard in the field for the benefit
of those of us on the outside, like computer engineers, software
engineers or compiler developers who frequently confuse the issues, as
I've just done here a couple times already, is a good thing...")
for SSDT's is acceptable as long as the translation problem you describe produces something [sic] equivalent to a derivation tree(The correct statement is not "... ought to be equivalent to..." but
"... ought to have, as an inextricable part and important special case
of...")
If you can't, then don't be surprised when people [???](The correct statement that here is: "If the standard treatments given
to these issues can't [conform to my rather restricted and misguided
understanding] then don't be surprised when people [meaning myself and
the "we" that I improperly imagine myself to be speaking on behalf
of]...")
Now, having read previous posts in this series, I see that you would like to structure the problem as generating output[s].(Correction added)
Now, the last I checked, if you were to phrase the question to someone
who is objective about the matter:
-- who is having the trouble communicating when there's a
misunderstanding...
* the engineer
oops. I think we just answered our question there. But going on:
* the engineer (or someone who runs an engineering company) who
remains vague in his language and confuses the issues even in his own
replies
* or the person who is more careful and precise, who receives
compliments on his or her use of language (even in technical articles)
who has studied over 30 human languages over the past 30 years and
more...
... then I think the answer is pretty obvious.
if you want to be understood
And they say Americans have no sense of irony.
.
- Follow-Ups:
- Re: CYK & Context-Free Expressions
- From: Chris F Clark
- Re: CYK & Context-Free Expressions
- From: Chris F Clark
- Re: CYK & Context-Free Expressions
- From: Chris F Clark
- Re: CYK & Context-Free Expressions
- Prev by Date: Re: Maximum Flow
- Next by Date: Re: Mark W. Hopkins theory perspective on parser engine technology?
- Previous by thread: Re: CYK & Context-Free Expressions
- Next by thread: Re: CYK & Context-Free Expressions
- Index(es):
Relevant Pages
|