Re: Problem Parsing Huge XML file using XML::Twig



vikrant wrote:
Hi,
I am trying to parse a Huge XMLfile using XML::Twig.The part of XML
file is as following:-
This is a sample code:-
-------------------------------------------------------------------------------------------------------------------------
<?xml version='1.0'?>
<StoreInfo>
<StoreName>AEC</StoreName>
<Products>
<Product>
<ProductID>21CR10.2</ProductID>
<ProductInfo name="abc" category="xyz">HUGE</ProductInfo>
<SupplierID>AEC</SupplierID>
<PurchasePrice>10.99</PurchasePrice>
<links>
<link>http://www.example.com</link>
<link>http://www.example2.com</link>
</links>
</Product>
<Product>
<ProductID>21CR11.2</ProductID>
<ProductInfo name="abcd" category="xyzd">ARROW</ProductInfo>
<SupplierID>AEC</SupplierID>
<PurchasePrice>10.49</PurchasePrice>
<links>
<link>http://www.example.com</link>
<link>http://www.example2.com</link>
</links>
</Product>
</Products>
</StoreInfo>
------------------------------------------------------------------------------------------------------------------------------------
Here,Product Tag repeating 2000 times in original file.

I am able to get the values of ProductID,SupplierID and
PurchasePrice using the following code.But,How do a get the value's at
"link" Node's ,attributes values and node value of ProductInfo NODE.
I know we can use XPath with XML::Twig but unfortunaly i am not able
to use it.So,please help me.Any document,link or refrences related to
it.I search a lot but failed to find.
-----------------------------------------------------------------------------------------------------------------------------
#!/bin/perl -w
use strict;
use XML::Twig;

my $t= new XML::Twig( TwigHandlers=> { Product => \&product});
$t->parsefile( 'sample.xml');
exit;
sub product
{ my ($t, $product)= @_;
my %product;
$product{id}= $product->field( 'ProductID');
$product{SupplierID}= $product->field( 'SupplierID');
$product{PurchasePrice}= $product->field( 'PurchasePrice');

print "$product{id}: $product{SupplierID} :$product{PurchasePrice}
\n";
$product->delete;
}
------------------------------------------------------------------------------------------------------------------------------

'field' is not the only method to get data from the data.
In your case you would use:

my $name= $product->first_child( 'ProductInfo')->att( 'name');

my $links= $product->first_child( 'links'); # the element links
my @links= map { $_->text } $links->children( 'link');

The tutorial at http://www.xmltwig.com/xmltwig/tutorial/index.html
(referenced in the README and at the top of the doc of the module)
gives more info about those methods.

One strange thing i find accidently is that when i am removing the
"StoreInfo" tag from above XML code the following error coming on
screen.
Error:-
junk after document element at line 5, column 0, byte 53 at /usr/lib/
perl5/site_perl/5.8.8/i386-linux-thread-multi/XML/Parser.pm line 187

If you remove the StoreInfo tag then the parser sees
<StoreName>AEC</StoreName> as the entire document, then dies, with an
appropriate error message, when it finds the rest of your original
document, and has no way of dealing with it, as it has already seen a
complete tree.

--
mirod
.