Re: Will XML::Simple work with keys, strings, integers, and dates?

From: Jim Gibson (jgibson_at_mail.arc.nasa.gov)
Date: 03/23/05

  • Next message: John Bokma: "Re: Will XML::Simple work with keys, strings, integers, and dates?"
    Date: Tue, 22 Mar 2005 18:13:45 -0800
    
    

    In article <th6%d.4321$C7.2935@news-server.bigpond.net.au>, Wes Barris
    <noway@nohow.com> wrote:

    > Hi,

    I haven't seen anybody else post a response, so I will give it a shot.
    I have only done a little XML processing in Perl (and some in Java),
    but I am not an expert.

    >
    > I am trying to use XML::Simple to parse an xml file. However, the xml file
    > that I am trying to parse is not in the same format as any of the XML::Simple
    > examples that I have seen. In all of the examples I have seen, the xml tags
    > are specific to their contents. In my xml file, the tag names are generic.
    > Here is a short sample of the xml that I am trying to parse:
    >
    > <dict>
    > <key>35</key>
    > <dict>
    > <key>Track ID</key><integer>35</integer>
    > <key>Name</key><string>Earache My Eye (Full Version)</string>
    > <key>Artist</key><string>Alice Bowie</string>

    [XML lines snipped]

    Is that really your XML? You have nested tags with the same name:
    <dict>. You also have <key> tags at different levels. That is going to
    make parsing more difficult.

    > </dict>
    > <key>36</key>
    > <dict>
    > <key>Track ID</key><integer>36</integer>
    > <key>Name</key><string>Earache My Eye</string>
    > <key>Artist</key><string>Cheech &#38; Chong</string>

    [more lines snipped]

    > </dict>
    >
    > I would like to be able to extract things like the "Name", "Artist", and
    > "Location" but I don't understand how to associate one of the elements of
    > the key array with one of the elements of the resulting string array.

    You have some very poorly designed XML there. It would be better if it
    were something like

       <attribute name="Track ID" value="35"> ...

    If you cannot change the XML definition, then you are probably better
    off using a SAX parser. XML::SAX::PurePerl works, but it is slow. For
    big files, try XML::Parser and the expat library. I found it about 75
    times faster in my one use.

    In a SAX parser, you define a handler package with callbacks that are
    called for each element in the XML. Then, you will be able to associate
    the <key> value with the subsequent value attribute because you will
    get the callbacks sequentially.

    Something like this might get you started:

    #!/usr/local/bin/perl
    use strict;
    use warnings;
    use XML::SAX::PurePerl;

    my $xmlstring;
    {
      local $/;
      $xmlstring = <DATA>;
    }

    my $handler = My::XML::Handler->new();
    my $parser = XML::SAX::PurePerl->new(Handler => $handler);
    $parser->parse_string($xmlstring);

    package My::XML::Handler;
    sub new
    {
      my $class = shift;
      my $self = {
        'key' => '',
        'data' => ''
      };
      bless $self, $class;
    }

    sub start_document{ print "start_document\n"; }

    sub start_element
    {
      my( $self, $element ) = @_;
      my $name = $element->{LocalName};
    }

    sub end_element
    {
      my( $self, $element ) = @_;
      my $name = $element->{Name};
      if( $name eq 'key' ) {
        $self->{key} = $self->{data};
      }elsif( $self->{data} ) {
        print "<$self->{key}> is '$self->{data}' of type $name\n";
      }
      $self->{data} = '';
    }

    sub characters
    {
      my( $self, $element ) = @_;
      my $chars = $element->{Data};
      $self->{data} .= $chars if( $chars =~ /\S/ );
    }
    sub end_document{ print "end_document\n"; }
    sub warning{ print "warning\n"; }
    sub error{ print "error\n"; }

    1;

    package main;

    __DATA__
    <doc>
    <key>35</key>
    <dict>
    <key>Track ID</key><integer>35</integer>
    <key>Name</key><string>Earache My Eye (Full Version)</string>
    <key>Artist</key><string>Alice Bowie</string>
    </dict>
    <key>36</key>
    <dict>
    <key>Track ID</key><integer>36</integer>
    <key>Name</key><string>Earache My Eye</string>
    <key>Artist</key><string>Cheech &#38; Chong</string>
    </dict>
    </doc>

    Which produces:

    start_document
    <Track ID> is '35' of type integer
    <Name> is 'Earache My Eye (Full Version)' of type string
    <Artist> is 'Alice Bowie' of type string
    <Track ID> is '36' of type integer
    <Name> is 'Earache My Eye' of type string
    <Artist> is 'Cheech & Chong' of type string
    end_document

    ----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
    http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
    ----= East and West-Coast Server Farms - Total Privacy via Encryption =----


  • Next message: John Bokma: "Re: Will XML::Simple work with keys, strings, integers, and dates?"

    Relevant Pages

    • An XML::Generator Question
      ... I've written a sub in which I output additional XML ... TAG2 block would have additional tags added, ... How can I call a sub which adds tags, and have them placed in the same XML ...
      (comp.lang.perl.misc)
    • CRAP CODE CHRONICLES: Xml
      ... this will be painfull for all the XML experts here on this board. ... sub original_content ... # call new_parse handler ... # call start tag handler with $2 ...
      (comp.lang.perl.misc)
    • Re: Getting NASM from C
      ... > format we were going to edit it in... ... all XML is? ... HTML is a specific "subset" of XML for displaying ... it's a text file with "tags" inside ...
      (alt.lang.asm)
    • Re: Gotta start somewhere ... how many of us are really out there?
      ... if you are thinking of using XML ... pure shell commands ... ... or whatever XML tags you want to use using the acutal ... accompish all this in hindsight,as all the commands ...
      (freebsd-questions)
    • Re: Read binary data file
      ... I think its use is quite industry-dependent: I've never seen it used in financial messaging (that's more likely to use SWIFT formats, which are tagged text) but its common in the telecommunications industry. ... Compared with XML its a LOT more compact (tags are one byte, fixed length fields don't have terminators, variable length fields are preceded by a one or two byte length) and it has a number of predefined field types as well as arrays. ...
      (comp.lang.java.programmer)