Re: help with pyparsing



On Dec 9, 11:01 pm, Prabhu Gurumurthy <pguru...@xxxxxxxxx> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

All,

I have the following lines that I would like to parse in python using
pyparsing, but have some problems forming the grammar.

Line in file:
table <ALINK> const { 207.135.103.128/26, 207.135.112.64/29 }
table <INTRANET> persist { ! 10.200.2/24, 10.200/22 }
table <RFC_1918> const { 192.168/16, ! 172.24.1/29, 172.16/12, 169.254/16 }
table <DIALER> persist { 10.202/22 }
table <RAVPN> const { 10.206/22 }
table <KS> const { \
10.205.1/24, \
169.136.241.68, \
169.136.241.70, \
169.136.241.71, \
169.136.241.72, \
169.136.241.75, \
169.136.241.76, \
169.136.241.77, \
169.136.241.78, \
169.136.241.79, \
169.136.241.81, \
169.136.241.82, \
169.136.241.85 }

I have the following grammar defn.

tableName = Word(alphanums + "-" + "_")
leftClose = Suppress("<")
rightClose = Suppress(">")
key = Suppress("table")
tableType = Regex("persist|const")
ip4Address = OneOrMore(Word(nums + "."))
ip4Network = Group(ip4Address + Optional(Word("/") +
OneOrMore(Word(nums))))
temp = ZeroOrMore("\\" + "\n")
tableList = OneOrMore(Optional("\\") |
ip4Network | ip4Address | Suppress(",") | Literal("!"))
leftParen = Suppress("{")
rightParen = Suppress("}")

table = key + leftClose + tableName + rightClose + tableType + \
leftParen + tableList + rightParen

I cannot seem to match sixth line in the file above, i.e table name with
KS, how do I form the grammar for it, BTW, I still cannot seem to ignore
comments using table.ignore(Literal("#") + restOfLine), I get a parse error.

Any help appreciated.
Thanks
Prabhu

Prabhu -

This is a good start, but here are some suggestions:

1. ip4Address = OneOrMore(Word(nums + "."))

Word(nums+".") will read any contiguous set of characters in the
string nums+".", so OneOrMore is not necessary for reading in an
ip4Address. Just use:

ip4Address = Word(nums + ".")


2. ip4Network = Group(ip4Address + Optional(Word("/") +
OneOrMore(Word(nums))))

Same comment, OneOrMore is not needed for the added value to the
ip4Address:

ip4Network = Group(ip4Address + Optional(Word("/") + Word(nums))))


3. tableList = OneOrMore(Optional("\\") |
ip4Network | ip4Address | Suppress(",") |
Literal("!"))

The list of ip4Networks is just a comma-delimited list, with some
entries preceded with a '!' character. It is simpler to use
pyparsing's built-in helper, delimitedList, as in:

tableList = Group( delimitedList(Group("!"+ip4Network)|ip4Network) )


Yes, I know, you are saying, "but what about all those backslashes?"
The backslashes look like they are just there as line continuations.
We can define an ignore expression, so that the table expression, and
all of its contained expressions, will ignore '\' characters as line
continuations:

table.ignore( Literal("\\") + LineEnd() )

And I'm not sure why you had trouble with ignoring '#' + restOfLine,
it works fine in the program below.

If you make these changes, your program will look something like this:

tableName = Word(alphanums + "-" + "_")
leftClose = Suppress("<")
rightClose = Suppress(">")
key = Suppress("table")
tableType = Regex("persist|const")
ip4Address = Word(nums + ".")
ip4Network = Group(ip4Address + Optional(Word("/") + Word(nums)))
tableList = Group(delimitedList(Group("!"+ip4Network)|ip4Network))
leftParen = Suppress("{")
rightParen = Suppress("}")

table = key + leftClose + tableName + rightClose + tableType + \
leftParen + tableList + rightParen
table.ignore(Literal("\\") + LineEnd())
table.ignore(Literal("#") + restOfLine)

# parse the input line, and pprint the results
result = OneOrMore(table).parseString(line)
from pprint import pprint
pprint(result.asList())

Prints out:
['ALINK',
'const',
[['207.135.103.128', '/', '26'], ['207.135.112.64', '/', '29']],
'INTRANET',
'persist',
[['!', ['10.200.2', '/', '24']], ['10.200', '/', '22']],
'RFC_1918',
'const',
[['192.168', '/', '16'],
['!', ['172.24.1', '/', '29']],
['172.16', '/', '12'],
['169.254', '/', '16']],
'DIALER',
'persist',
[['10.202', '/', '22']],
'RAVPN',
'const',
[['10.206', '/', '22']],
'KS',
'const',
[['10.205.1', '/', '24'],
['169.136.241.68'],
['169.136.241.70'],
['169.136.241.71'],
['169.136.241.72'],
['169.136.241.75'],
['169.136.241.76'],
['169.136.241.77'],
['169.136.241.78'],
['169.136.241.79'],
['169.136.241.81'],
['169.136.241.82'],
['169.136.241.85']]]

-- Paul
.



Relevant Pages

  • Re: Parsing Expression Grammar
    ... problem with your grammar, it will simply not parse some inputs. ... The exception being, if the grammar is LL, then a PEG will ... I think it is possible to get a parser which nearly succeeds on all ...
    (comp.compilers)
  • A few questions about parsing
    ... infrastructure (parse trees, intermediate forms etc). ... to create a parser that will receive the BNF grammar from libbnf and ... right recursions. ... What parser can parse this grammar? ...
    (comp.compilers)
  • Re: Great Men of Our Time
    ... by Andre Jute ... The author is very competent in his use of grammar and has a good grasp of the ... The characters were no more than ... expired in the overblown prose. ...
    (rec.audio.tubes)
  • Re: left() : substring() :: right() : ?
    ... but the actual string I need to chop off the right-hand end, so PARSE ... I agree with you that REXX is pretty left-to-right ... can strip different characters, not just one character. ...
    (comp.lang.rexx)
  • DCG parsing - Natural Language in Prolog
    ... I am learning grammar and parsing. ... BUT HOW CAN I EXTEND IT TO PARSE A MORE COMPLICATED ... The grammar rules to be covered are: ...
    (comp.lang.prolog)