Last line issue



Dear all,

to pre-process my XML dataset in run simple Perl script on it, which extract Id identifier from XML data and paste the whole XML record to it. For example, the input data looks like:

<NoteSet>
<Note>
<Id>001</Id>
<To>Thomas</To>
<From>Joana</From>
</Note>
<Note>
<Id>002</Id>
<To>John</To>
<From>Paula</From>
</Note>
<Note>
<Id>003</Id>
<To>Andrew</To>
<From>Maria</From>
</Note>
</NoteSet>

and the desire output using the script should be:

001 <Note><Id>001</Id><To>Thomas</To><From>Joana</From></Note>
002 <Note><Id>002</Id><To>John</To><From>Paula</From></Note>
003 <Note><Id>003</Id><To>Andrew</To><From>Maria</From></Note>

But I can't figure why the script below omit the last record in the input dataset, e.g.:

001 <Note><Id>001</Id><To>Thomas</To><From>Joana</From></Note>
002 <Note><Id>002</Id><To>John</To><From>Paula</From></Note>

I'd appreciate any suggestions or pointers.
Best, Andrej


## test.pl ##
use strict;
my $FNI = shift;
my $FNO = "$FNI.dat";
my $started = 0;
my $chunk;
my @chunk;

open OUT, ">$FNO";
open IN, "$FNI";
while (<IN>) {
s/^\s+//g;
s/\s+$//g;
if (m/\<Note>/) {
if ($started) {
my $clob = join("", @chunk);
&process_chunk($clob);
} else {
$started = 1;
}
@chunk = ();
push (@chunk, $_);
while (1) {
$chunk = <IN>;
$chunk =~ s/^\s+//g;
$chunk =~ s/\s+$//g;
push (@chunk, $chunk);
last if ($chunk =~ m/\<\/Note>/);
}
}
}
close IN;
close OUT;

sub process_chunk {
my $clob = shift;
$clob =~ s/\t+/ /g;
my $id;
if ($clob =~ m/\<Id>(\d+)\<\/Id>/) {
$id = $1;
}
print OUT "$id\t$clob\n";
}

.



Relevant Pages

  • Re: WINDOWS MEDIA FILE EDITOR - HELP!!!
    ... before (You might want to look at programs like microsofts XML Notepad ... The actual file format can be defined by a DTD ... >but it is all our clients protocol supports) and with this, been encoding asf ... It doesn't recognize the same script file as I once used...get ...
    (microsoft.public.windowsmedia)
  • Re: WSH in VS.NET 2003?
    ... or will wsh xml include a ... Get the "TechNet Script Center Sample Scripts" ... Other|Useful Scripting Technologies|Saving Data in XML Format ... There is no schema for getting MS. ...
    (microsoft.public.scripting.wsh)
  • Re: Import XML from Command-Line into SQL
    ... Set objBL = CreateObject ... >I'd probably create a mapping schema and a VB Script ... that uses the SQL XML ...
    (microsoft.public.sqlserver.xml)
  • Re: Speed quirk: redundant line gives six-fold speedup
    ... referenced again, directly or indirectly, by the rest of the script. ... when putting the 28 dummies in a list or a dict using a for loop ... What about memory allocation that would be performed chunk by chunk by the interpreter? ... All the instructions of the program would then be in the cpu cache for example in the same block while in the other case theyr would be in two distinct block -> thus less caching for the cpu... ...
    (comp.lang.python)
  • Re: Import XML from Command-Line into SQL
    ... I'd probably create a mapping schema and a VB Script that uses the SQL XML ... Bulk Load component to bulk load the data. ...
    (microsoft.public.sqlserver.xml)