Re: Parsing: Help on ignoring quoted tokens.



On Jun 1, 7:30 am, paktsardi...@xxxxxxxxx wrote:
I am writing a (hopefully) simple parser to parse the contents of a
text file and turn it into some sort of html form. Here's a small
example:

forms.txt contains something like:

# Registration Form
registration {
numcols:2
[heading: Account Details] [ ]
[label:"User Name:"] [textbox:username:amcnab:mandatory]
[label:"First Name:"] [textbox:first_name:Andy]
[label:"Last Name:"] [textbox:last_name:McNab]
[label:"Password:"] [passbox:passwd::mandatory]

}

# Error form
error {
numcols:2
[heading:Explosion Error!][]
[label:"Vent Gas?:"] [select:vent:yes|no:no]

}

where:
[.*] denotes an html table cell.

[...snip...]

Now, my question is: what is the best way to approach the parsing of
this file?

If you say "parse a text file", you are usually dealing with brackets
and/or nested { ... } constructs and I can clearly see the
"registration { ... }" - and "error { ... }" - structure in your
file.

I strongly recommend to read first perlfaq4: "How do I find matching/
nesting anything?"

However, in order to keep this simple, I would suggest to make a few
assumptions about the structure of your file, thereby effectively
eliminating the inherent nested structure.

Those assumption would be, for example:
- there are no nested { ... } constructs.
- each { ... } - contruct begins with a single line format /^\w+\s*{$/
and it ends with a single line /^}$/
- inside a { ... } construct, each line begins with format /^\s+/
and it is of the form /\s*\[.*?\]/g
- the first line inside a { ... } construct would be of the form
/^\s+\[heading:.*?\]\s+\[\s*\]$/

This would allow to process the file line-by-line using only regexes,
but still producing valid html code. At first, this solution seems to
be over simplified, but as long as you can keep away from nested
structures, you can easily add/remove/modify more regexes in a trial-
and-error approach as you develop your Perl program from the bottom
up.

Here is how I would start the bottom-up approach with your test-file:

==============================
use strict;
use warnings;

my $inputfile = 'forms.txt';
open my $inp, '<', $inputfile
or die "Error 0010: open < '$inputfile': $!";

my $comment = '';
while (<$inp>) {
chomp;
if (m{^\#\s*(.*)$}xms) {
$comment = $1;
}
if (m{^\s+\[}xms) {
my @td = m{\[(.*?)\]}gxms;
if ($comment ne '') {
if (@td != 2
or $td[0] !~ m{^heading:(.*)$}xms) {
die "Error 0020: unexpected '$_'";
}
print "<h2>$1 ($comment)</h2>\n";
print "<table>\n";
$comment = '';
next;
}
print " <tr>\n";
for my $element (@td) {
if ($element =~ m{^\s*$}xms) {
print " <td>&nbsp;</td>\n";
}
else {
print " <td>$element</td>\n";
}
}
print " </tr>\n";
next;
}
if (/^}/xms) {
print "</table>\n";
$comment = '';
next;
}
}

close $inp;
==============================

This approach is very flexible and extremely scalable, I've already
tried it successfully by transforming a plain old schema-listing of a
mainframe database from basic Ascii format into Html.

Bonus points if your answer makes no reference to lex or yacc. :)

Thanks for the bonus points :-)

--
Klaus

.



Relevant Pages

  • Re: Cant make table the size of the screen
    ... Bottom: the footer. ... I tried changing the DTD to different settings but that didn't work. ... Here's an example with coloured borders so you can see the boundaries... ...
    (alt.html)
  • Cant make table the size of the screen
    ... Bottom: the footer. ... This is faqs ... I tried changing the DTD to different settings but that didn't work. ...
    (alt.html)
  • Newbie, want onclick to work, using Verizon sitebuilder
    ... Sometimes I update the HTML and save the file and nothing has ... undo the change, click "Done" again, and then it's changed. ... Does every page have to have a google search box? ... It started out at the bottom of the page ...
    (comp.lang.javascript)
  • Re: Cant make table the size of the screen
    ... Bottom: the footer. ... I tried changing the DTD to different settings but that didn't work. ... Here's an example with coloured borders so you can ...
    (alt.html)
  • Re: CSS Layout (Bottom of page)
    ... bottom of the 'page' instead of resizing to the bottom of the browser ... Here is a link to the CSS: ... you had been considerate and included the CSS in the head of the HTML ... PNG or Gif is a much more suitable image format than JPEG for the ...
    (comp.infosystems.www.authoring.stylesheets)