Last line issue
- From: andrej.kastrin@xxxxxxxx (Andrej Kastrin)
- Date: Sat, 26 Jan 2008 12:18:55 +0100
Dear all,
to pre-process my XML dataset in run simple Perl script on it, which extract Id identifier from XML data and paste the whole XML record to it. For example, the input data looks like:
<NoteSet>
<Note>
<Id>001</Id>
<To>Thomas</To>
<From>Joana</From>
</Note>
<Note>
<Id>002</Id>
<To>John</To>
<From>Paula</From>
</Note>
<Note>
<Id>003</Id>
<To>Andrew</To>
<From>Maria</From>
</Note>
</NoteSet>
and the desire output using the script should be:
001 <Note><Id>001</Id><To>Thomas</To><From>Joana</From></Note>
002 <Note><Id>002</Id><To>John</To><From>Paula</From></Note>
003 <Note><Id>003</Id><To>Andrew</To><From>Maria</From></Note>
But I can't figure why the script below omit the last record in the input dataset, e.g.:
001 <Note><Id>001</Id><To>Thomas</To><From>Joana</From></Note>
002 <Note><Id>002</Id><To>John</To><From>Paula</From></Note>
I'd appreciate any suggestions or pointers.
Best, Andrej
## test.pl ##
use strict;
my $FNI = shift;
my $FNO = "$FNI.dat";
my $started = 0;
my $chunk;
my @chunk;
open OUT, ">$FNO";
open IN, "$FNI";
while (<IN>) {
s/^\s+//g;
s/\s+$//g;
if (m/\<Note>/) {
if ($started) {
my $clob = join("", @chunk);
&process_chunk($clob);
} else {
$started = 1;
}
@chunk = ();
push (@chunk, $_);
while (1) {
$chunk = <IN>;
$chunk =~ s/^\s+//g;
$chunk =~ s/\s+$//g;
push (@chunk, $chunk);
last if ($chunk =~ m/\<\/Note>/);
}
}
}
close IN;
close OUT;
sub process_chunk {
my $clob = shift;
$clob =~ s/\t+/ /g;
my $id;
if ($clob =~ m/\<Id>(\d+)\<\/Id>/) {
$id = $1;
}
print OUT "$id\t$clob\n";
}
.
- Follow-Ups:
- Re: Last line issue
- From: John W. Krahn
- Re: Last line issue
- Prev by Date: Re: Can't install CDB_File
- Next by Date: Re: Last line issue
- Previous by thread: Can't install CDB_File
- Next by thread: Re: Last line issue
- Index(es):
Relevant Pages
|
|