Re: parsing XML



From: Kevin Viel <kviel@xxxxxxxxxxxxxxx>
Jenda Krynicky kindly provided:

use XML::Rules;

my $parser = XML::Rules->new(
rules => [
Id => 'content',
Item => sub {$_[1]->{Name} => $_[1]->{_content}},
# from the <Item> tags we are interested in the content
# and want to use the Name attribute as the key to access
# that value. We ignore the Type attribute.
DocSum => sub {
# by now all the data from the <Item>s are in the %{$_[1]} hash

if ($_[1]->{Chromosome} != 8
or $_[1]->{NomenclatureName} !~ /\bviral\b/) {
# ignore everything outside the 8th chromosome that's not
# 'viral'
return;
}

# do something with the data
# or return the part of the data you want to keep using whatever
# you suits you best as the key
return $_[1]->{Name} => $_[1];
},
eSummaryResult => 'pass no content',
]
);

my $data = $parser->parse($the_xml_or_file);

print $data->{MYC}{NomenclatureName}, "\n";
__END__

I'd like to understand this better. It seems to be a reference
(little arrow). Is that the same as using /@referenced_array, for
instance?

Assuming you use the code above as is you end up with a reference to
a HoH in $data. The first level of keys will be the Names of the
genes (or whatever's the content of the <DocSum> tags), the second
level will be the values of the Name attributes from the <Item> tags.

You may want to run the script on a short XML and print the returned
data structure by

use Data::Dumper;
print Dumper($data);

It seems to be a hash with the key "rules" and a four-item array as
its value. The third item of this array is a hash with a subroutine,
or anonymous function declaration, as its value.

The constructor of the XML::Rules object accepts several named
arguments, the most important being "rules". it's either a reference
to an array or hash containig the "rules" to apply to the tags read
from the XML. Whenever a tag is fully parsed (including the </closing
tag>!) the module calls the specified subroutine (or builtin) to
massage/filter/process the data from the tag. Whatever the subroutine
returns is then made available to the subroutine specified for the
parent tag.

I am wrong, correct?

A) Correct, you were incorrect.
B) Incorrect, you were correct.
C) You're still buying beer.

To start with specific questions, could someone explain:

> Item => sub {$_[1]->{Name} => $_[1]->{_content}}

In this particular case whenever the <Item ....>...</Item> is fully
parsed this subroutine is called. It ignores the Type attribute and
returns just the value of the Name attribute and the tag content in
such a way that the first becomes a key and the later the value in
the attribute hash of the parent tag, in this case <DocSum>.
Later on, once the </DocSum> closing tag is parser all the values
from all the <Item> tags within that <DocSum> will be available in
the subroutine specified for the <DocSum> tag in the hash referenced
by $_[1] like this:

$_[1]->{Name} # the value will be "MYC"
$_[1]->{Description} # = "v-myc myelocytomatosis viral oncogene
homolog (avian)"

etc.

HTH, Jenda
===== Jenda@xxxxxxxxxxx === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery

.



Relevant Pages

  • Re: HTML::Parser
    ... by keeping track in which tag the parser currently is. ... > ideally, I'd like to call the text subroutine from the start subroutine, ... > and pass it a hash to put the text value in. ... Those are handlers and they can't have such a return value. ...
    (comp.lang.perl.misc)
  • Re: HTML::Parser
    ... Tassilo v. Parseval wrote: ... Is there a way to associate the tag text with the tag, ... ideally, I'd like to call the text subroutine from the start subroutine, ... and pass it a hash to put the text value in. ...
    (comp.lang.perl.misc)
  • Re: Dynamically tagged expression required
    ... Every time the subroutine is called, the parameter is *initialized* with the value, and therefore it can be different for each ... "The Class type is a real data type. ... It looks like some ShapeClassPtr.all is OK there, except that I get the CONSTRAINT_ERROR, if the new value has a different tag. ... This observation agrees with what Dmitry said - the tag of the class-wide object cannot be changed. ...
    (comp.lang.ada)
  • Re: Using subroutines as hash dereferences. Bad idea?
    ... When a subroutine returns a hash reference how can you test ... Check whether you got a reference to a hash back. ... I don't follow why you would return 0 for failure, ...
    (comp.lang.perl.misc)
  • Re: Porting Ruby snippet to Lisp
    ... Dispatch proceeds in 2 steps: ... Some OO languages directly tag each object with a pointer to its class ... hash or index on a suitable identity tag (if e.g., ... objects by address (heavier range comparison or hash) or you have more ...
    (comp.lang.lisp)