Re: parsing XML
- From: Jenda@xxxxxxxxxxx (Jenda Krynicky)
- Date: Thu, 25 Jan 2007 16:10:30 +0100
From: Kevin Viel <kviel@xxxxxxxxxxxxxxx>
I have obtain results of a query in XML format:
<?xml version="1.0"?>
<!DOCTYPE eSummaryResult PUBLIC "-//NLM//DTD eSummaryResult, 29
October 2004//EN"
"http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSummary_041029.dtd">
<eSummaryResult> <DocSum>
<Id>4609</Id>
<Item Name="Name" Type="String">MYC</Item>
<Item Name="Description" Type="String">v-myc myelocytomatosis
viral oncogene homolog (avian)</Item>
<Item Name="Orgname" Type="String">Homo sapiens</Item>
<Item Name="Status" Type="Integer">0</Item>
<Item Name="CurrentID" Type="Integer">0</Item>
<Item Name="Chromosome" Type="String">8</Item>
<Item Name="GeneticSource" Type="String">genomic</Item>
<Item Name="MapLocation" Type="String">8q24.12-q24.13</Item>
<Item Name="OtherAliases" Type="String">c-Myc</Item> <Item
Name="OtherDesignations" Type="String">avian
myelocytomatosis viral oncogene homolog|myc proto-oncogene
protein|v-myc avian myelocytomatosis viral oncogene homolog</Item>
<Item Name="NomenclatureSymbol" Type="String">MYC</Item>
<Item Name="NomenclatureName" Type="String">v-myc
myelocytomatosis viral oncogene homolog (avian)</Item>
<Item Name="NomenclatureStatus" Type="String">Official</Item>
<Item Name="TaxID" Type="Integer">9606</Item> <Item
Name="Mim" Type="List">
<Item Name="int" Type="Integer">190080</Item>
</Item>
</DocSum>
I would like search for certain keywords and abstract all gene in this
query that meet the criteria. Can someone recommend a module? I
looked at XML::Simple::DTDReader.
Yeah, the module looks fine. There are of course many options. One
being XML::Rules. Assuming the <DocSum> is being repeated and you
want to do something with only some of those you could use something
like:
use XML::Rules;
my $parser = XML::Rules->new(
rules => [
Id => 'content',
Item => sub {$_[1]->{Name} => $_[1]->{_content}},
# from the <Item> tags we are interested in the content
# and want to use the Name attribute as the key to access
# that value. We ignore the Type attribute.
DocSum => sub {
# by now all the data from the <Item>s are in the %{$_[1]} hash
if ($_[1]->{Chromosome} != 8
or $_[1]->{NomenclatureName} !~ /\bviral\b/) {
# ignore everything outside the 8th chromosome that's not 'viral'
return;
}
# do something with the data
# or return the part of the data you want to keep using whatever
# you suits you best as the key
return $_[1]->{Name} => $_[1];
},
eSummaryResult => 'pass no content',
]
);
my $data = $parser->parse($the_xml_or_file);
print $data->{MYC}{NomenclatureName}, "\n";
__END__
HTH, Jenda
===== Jenda@xxxxxxxxxxx === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery
.
- Follow-Ups:
- Re: parsing XML
- From: Kevin Viel
- Re: parsing XML
- References:
- parsing XML
- From: Kevin Viel
- parsing XML
- Prev by Date: RE: Help, using script to edit router config (entering different modes automatically)
- Next by Date: Re: Sending mail
- Previous by thread: parsing XML
- Next by thread: Re: parsing XML
- Index(es):