Re: Is pyparsing really a recursive descent parser?
- From: Kay Schluehr <kay.schluehr@xxxxxxx>
- Date: Sun, 04 Nov 2007 01:49:20 -0800
On 4 Nov., 03:07, Neil Cerutti <horp...@xxxxxxxxx> wrote:
On 2007-11-04, Just Another Victim of the Ambient Morality
<ihates...@xxxxxxxxxxx> wrote:
"Neil Cerutti" <horp...@xxxxxxxxx> wrote in message
news:oz3Xi.39812$G23.23308@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
On 2007-11-03, Paul McGuire <pt...@xxxxxxxxxxxxx> wrote:
On Nov 3, 12:33 am, "Just Another Victim of the Ambient Morality"
<ihates...@xxxxxxxxxxx> wrote:
It has recursion in it but that's not sufficient to call it a
recursive
descent parser any more than having a recursive implementation of the
factorial function. The important part is what it recurses through...
<snip>
In my opinion, the rule set I mentioned in my original post:
grammar = OneOrMore(Word(alphas)) + Literal('end')
...should be translated (internally) to something like this:
word = Word(alphas)
grammar = Forward()
grammar << ((word + grammar) | (word + Literal(end)))
This allows a recursive search to find the string correct without
any
need for "backtracking," if I understand what you mean by that.
Couldn't
pyparsing have done something like this?
Dear JAVotAM -
This was a great post! (Although I'm not sure the comment in the
first paragraph was entirely fair - I know the difference between
recursive string parsing and recursive multiplication.)
You really got me thinking more about how this recursion actually
behaves, especially with respect to elements such as OneOrMore. Your
original question is really quite common, and up until now, my stock
answer has been to use negative lookahead. The expression you posted
is the first time I've seen this solution, and it *does* work.
Is there not an ambiguity in the grammar?
In EBNF:
goal --> WORD { WORD } END
WORD is '[a-zA-Z]+'
END is 'end'
I think it is fine that PyParsing can't guess what the composer
of that grammar meant.
First, I don't know if that constitutes an ambiguity in the
grammar. 'end' is a word but it's unambiguous that this grammar
must end in a literal 'end'. You could interpret the input as
just a sequence of words or you could interpret it as a
sequence of words terminated by the word 'end'.
Yeah. If it were a regex, it would be: '[ab]+b'. That is a fine
regex, because a regex is generally just a recognizer; the
ambiguity doesn't have to do with recognizing the language. But
PyParsing is parser. The ambiguity is in deciding what's a
Word(alphas) and what's an 'end'.
One interpretation conforms to the grammar while the other
doesn't. You would assume that the interpretation that agrees
with the grammar would be the preferable choice and so should
the program. Secondly, even if it is an ambiguity... so what?
pyparsing's current behaviour is to return a parse error,
pretending that the string can't be parsed. Ideally, perhaps
it should alert you to the ambiguity but, surely, it's better
to return _a_ valid parsing than to pretend that the string
can't be parsed at all...
I wouldn't characterize it as pretending. How would you parse:
hello end hello end
"WORD END WORD END" and "WORD WORD WORD END" are both valid
interpretations, according to the grammar.
As soon as you remove the ambiguity from the grammar, PyParsing
starts to work correctly.
I think you are correct about this. But I'm not sure how much it shall
matter. Just take a look at Pythons Grammar
http://svn.python.org/view/python/trunk/Grammar/Grammar?rev=55446&view=markup
Without special keyword treatment this grammar would be ambigous and
couldn't be parsed using an LL(1) parser. The "grammar compiler" which
builds the parser tables creates a special label for each keyword.
This label is filtered when a NAME token is feeded into the parser.
With the label that belongs to e.g. 'if' or 'while' the correct
statement can be selected in constant time. Same happens when I use
the parser generator with your EBNF grammar. With a little more
adaption also NUMBER token could be filtered. But this would be
overdesign.
Theoretical beauty is compromised here using reasonable default
assumptions for keeping the grammar simple ( "convention over
configuration" to borrow a Rails slogan ).
Tokenization is another issue in Python. It is indeed somewhat special
due to significant whitespace and line continuation but tokenization
is conceptually much simpler and one would likely throw all kinds of
options and special case handling in the lexical analysis phase.
.
- Follow-Ups:
- Re: Is pyparsing really a recursive descent parser?
- From: Neil Cerutti
- Re: Is pyparsing really a recursive descent parser?
- References:
- Is pyparsing really a recursive descent parser?
- From: Just Another Victim of the Ambient Morality
- Re: Is pyparsing really a recursive descent parser?
- From: Marc 'BlackJack' Rintsch
- Re: Is pyparsing really a recursive descent parser?
- From: Paul McGuire
- Re: Is pyparsing really a recursive descent parser?
- From: Just Another Victim of the Ambient Morality
- Re: Is pyparsing really a recursive descent parser?
- From: Paul McGuire
- Re: Is pyparsing really a recursive descent parser?
- From: Neil Cerutti
- Re: Is pyparsing really a recursive descent parser?
- From: Just Another Victim of the Ambient Morality
- Re: Is pyparsing really a recursive descent parser?
- From: Neil Cerutti
- Is pyparsing really a recursive descent parser?
- Prev by Date: Re: how to iterate through each set
- Next by Date: Re: How to know more about an exception?
- Previous by thread: Re: Is pyparsing really a recursive descent parser?
- Next by thread: Re: Is pyparsing really a recursive descent parser?
- Index(es):
Relevant Pages
|