Re: Parsing table in rtf file
- From: "Skye Shaw!@#$" <skye.shaw@xxxxxxxxx>
- Date: Sun, 30 Dec 2007 15:07:48 -0800 (PST)
Peter Jamieson wrote:
I am trying to extract data from the table in a large number of rtf files.
I tried RTF::Tokenizer and RTF::Parser but could not make progress
so have decided to try regular expressions.
What problem(s) were you having with the RTF modules?
I know looking at RTF can be fun and all, but why hammer out some
regexes to parse RTF
when a module already exists for this?
My project is to get the tabular data into a db for further analysis.
My problem is that I cannot see how to parse the data rows so
that they match the correct field headings.
Any advice or suggestions appreciated!
Not familiar with the format's tokens, but from looking at it quickly,
it appears as though the type of token is given after the text
portion, so you can try something like:
#your sub class of RTF::Parser
#not tested
my $tables = [];
my $cells = [];
my $rows = [];
my $token;
#define tokens...
sub text {
$token = $_[1];
}
my %do_on_control = (
'__DEFAULT__' => sub {
my ( $self, $type, $arg ) = @_;
if($arg) {
if($arg eq $CELL_END ) {
push @$cells, $tok;
}
elsif($arg eq $ROW_END ) {
push @$rows, $cells;
$cells = []
}
elsif($arg eq $TABLE_END ) {
push @$tables, $rows;
$rows = []
}
}
});
sub parse
{
my ($self,$file) = @_;
$self->control_definition( \%do_on_control );
open(my $IN,$file) || die $!;
$self->parse_stream($IN);
close($IN);
$tables;
}
.
- Follow-Ups:
- Re: Parsing table in rtf file
- From: Peter Jamieson
- Re: Parsing table in rtf file
- References:
- Parsing table in rtf file
- From: Peter Jamieson
- Parsing table in rtf file
- Prev by Date: FAQ 5.10 How can I set up a footer format to be used with write()?
- Next by Date: FAQ 6.11 Can I use Perl regular expressions to match balanced text?
- Previous by thread: Parsing table in rtf file
- Next by thread: Re: Parsing table in rtf file
- Index(es):
Relevant Pages
|