Re: Parsing table in rtf file




Peter Jamieson wrote:
I am trying to extract data from the table in a large number of rtf files.
I tried RTF::Tokenizer and RTF::Parser but could not make progress
so have decided to try regular expressions.

What problem(s) were you having with the RTF modules?

I know looking at RTF can be fun and all, but why hammer out some
regexes to parse RTF
when a module already exists for this?

My project is to get the tabular data into a db for further analysis.
My problem is that I cannot see how to parse the data rows so
that they match the correct field headings.

Any advice or suggestions appreciated!

Not familiar with the format's tokens, but from looking at it quickly,
it appears as though the type of token is given after the text
portion, so you can try something like:

#your sub class of RTF::Parser
#not tested

my $tables = [];
my $cells = [];
my $rows = [];

my $token;

#define tokens...


sub text {
$token = $_[1];
}


my %do_on_control = (

'__DEFAULT__' => sub {

my ( $self, $type, $arg ) = @_;

if($arg) {
if($arg eq $CELL_END ) {
push @$cells, $tok;
}
elsif($arg eq $ROW_END ) {
push @$rows, $cells;
$cells = []
}
elsif($arg eq $TABLE_END ) {
push @$tables, $rows;
$rows = []
}

}
});

sub parse
{
my ($self,$file) = @_;
$self->control_definition( \%do_on_control );
open(my $IN,$file) || die $!;
$self->parse_stream($IN);
close($IN);

$tables;
}

.



Relevant Pages

  • Re: Parsing table in rtf file
    ... What problemwere you having with the RTF modules? ... #your sub class of RTF::Parser ... push @$rows, $cells; ... The sticking point was getting the data rows to line up with the ...
    (comp.lang.perl.misc)
  • RTF to DOC conversion
    ... ' Script program to convert RTF to DOC. ... Dim Response ... 'this sub procedure is copied from Peter Jamieson post ...
    (microsoft.public.word.conversions)
  • RTF to DOC conversion
    ... Script program to convert RTF to DOC. ... Dim Response ... 'this sub procedure is copied from Peter Jamieson post ...
    (microsoft.public.word.conversions)
  • RTF to DOC conversion
    ... Script program to convert RTF to DOC. ... Dim Response ... 'this sub procedure is copied from Peter Jamieson post ...
    (microsoft.public.word.conversions)
  • RTF to DOC conversion
    ... ' Script program to convert RTF to DOC. ... Dim Response ... sub convertRTFtoDOC ...
    (microsoft.public.word.conversions)