Re: Parsing table in rtf file




"Skye Shaw!@#$" <skye.shaw@xxxxxxxxx> wrote in message
news:f51eccde-8c5d-444a-8cb0-bcdefe81c399@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Peter Jamieson wrote:
I am trying to extract data from the table in a large number of rtf
files.
I tried RTF::Tokenizer and RTF::Parser but could not make progress
so have decided to try regular expressions.

What problem(s) were you having with the RTF modules?

I know looking at RTF can be fun and all, but why hammer out some
regexes to parse RTF
when a module already exists for this?

My project is to get the tabular data into a db for further analysis.
My problem is that I cannot see how to parse the data rows so
that they match the correct field headings.

Any advice or suggestions appreciated!

Not familiar with the format's tokens, but from looking at it quickly,
it appears as though the type of token is given after the text
portion, so you can try something like:

#your sub class of RTF::Parser
#not tested

my $tables = [];
my $cells = [];
my $rows = [];

my $token;

#define tokens...


sub text {
$token = $_[1];
}


my %do_on_control = (

'__DEFAULT__' => sub {

my ( $self, $type, $arg ) = @_;

if($arg) {
if($arg eq $CELL_END ) {
push @$cells, $tok;
}
elsif($arg eq $ROW_END ) {
push @$rows, $cells;
$cells = []
}
elsif($arg eq $TABLE_END ) {
push @$tables, $rows;
$rows = []
}

}
});

sub parse
{
my ($self,$file) = @_;
$self->control_definition( \%do_on_control );
open(my $IN,$file) || die $!;
$self->parse_stream($IN);
close($IN);

$tables;
}


Thanks for the input Skye!
I read up all I could find on the rtf parsing and tokenizing modules
and came to the conclusion that they were good for text data but
not well suited to tabular data. However I would be more than happy
to be proven wrong!. I can get the header and footer info from the
rtf files OK into a db but could not make progress with the tabular
data. The sticking point was getting the data rows to line up with the
field headings. I had previously used VBA code in MS Excel and MS Word
for this project but file bloat and unreliability has me searching for a
Perl solution.
I will have a close look at your suggestions asap.
Thanks for your help...very much appreciated!...all the best for 2008!
....cheers, Peter


.



Relevant Pages

  • Re: Parsing table in rtf file
    ... What problemwere you having with the RTF modules? ... #your sub class of RTF::Parser ... push @$rows, $cells; ...
    (comp.lang.perl.misc)
  • RTF to DOC conversion
    ... ' Script program to convert RTF to DOC. ... Dim Response ... 'this sub procedure is copied from Peter Jamieson post ...
    (microsoft.public.word.conversions)
  • RTF to DOC conversion
    ... Script program to convert RTF to DOC. ... Dim Response ... 'this sub procedure is copied from Peter Jamieson post ...
    (microsoft.public.word.conversions)
  • RTF to DOC conversion
    ... Script program to convert RTF to DOC. ... Dim Response ... 'this sub procedure is copied from Peter Jamieson post ...
    (microsoft.public.word.conversions)
  • RTF to DOC conversion
    ... ' Script program to convert RTF to DOC. ... Dim Response ... sub convertRTFtoDOC ...
    (microsoft.public.word.conversions)