XML::Twig parser junk.

From: Jason Casden (caz_at_monkey.org)
Date: 04/27/04

  • Next message: John Bokma: "Re: XML::Twig parser junk."
    Date: 27 Apr 2004 09:47:05 -0700
    
    

    Hi, I'm trying to use XML::Twig to insert dates into <unitdate> tags
    to create an EAD finding aid, but I'm running into a strang XML::Twig
    parser error. My file basically looks like this:

    <c02 level="subseries"><did><unittitle>Favorite story (Radio
    program)</unittitle><unitdate era="ce"
    calendar="gregorian"></unitdate></did>
    <c03 level="item">
    <did>
    <container type="Volume">SPEC.L&amp;L.SCRIPTS.FS.v.1</container>
    <unittitle>Meridian 7-1212</unittitle>
    <unitdate era="ce" calendar="gregorian">1946</unitdate>
    </did>
    </c03>
    <c03 level="item">
    <did>
    <container type="Volume">SPEC.L&amp;L.SCRIPTS.FS.v.2</container>
    <unittitle>Wuthering Heights</unittitle>
    <unitdate era="ce" calendar="gregorian">1946</unitdate>
    </did>
    </c03>
    </c02>
    <c02 level="subseries"><did><unittitle>Favorite story (Television
    Program)</unittitle><unitdate era="ce"
    calendar="gregorian"></unitdate></did>
    <c03 level="item">
    <did>
    <container type="Volume">SPEC.L&amp;L.SCRIPTS.FS.TV.v.13</container>
    <unittitle>How much land does a man need? (Script)</unittitle>
    <unitdate era="ce" calendar="gregorian">1952</unitdate>
    </did>
    </c03>
    <c03 level="item">
    <did>
    <container type="Volume">SPEC.L&amp;L.SCRIPTS.FS.TV.v.13</container>
    <unittitle>The magician (script)</unittitle>
    <unitdate era="ce" calendar="gregorian">1952</unitdate>
    </did>
    </c03>
    </c02>

    (there are a lot more c03 records, though.)

    So, when I had only one set of <c02> tags enclosing a bunch of c03
    item records, my XML:Twig code worked fine. But, now that I have
    multiple <c02>'s, it has a parse error after the first c02 is closed,
    and when the second is opened:

    junk after document element at line 939, column 0, byte 28245 at
    C:/Perl/site/li
    b/XML/Parser.pm line 187 (this line corresonds to the second <c02
    level="subseries">).

    This line, and the first <c02> line are nearly identical, so I'm at a
    loss as to why this is happening. We passed this code through an XML
    validator, with no problems. Here is the Twig code I'm using:

    $arrindex = 0;

    my $twig_handlers = {'c02/did/unitdate' => \&unit_date_func};

    my $twig = new XML::Twig(TwigHandlers => $twig_handlers);

    $twig->parsefile($xmlfile);

    open XMLOUTPUT, ">:utf8", "$xmlfile";
    select XMLOUTPUT;
    $twig->set_pretty_print( 'indented');
    $twig->print;

    # subprocedures
    sub unit_date_func
    {
        my ($t, $u_date) = @_;
        $date_string = @datearr[$arrindex] . '-' . @datearr[$arrindex +
    1];
        $arrindex += 2;
        $u_date->set_text($date_string);
    }

    If anyone has any ideas about this, please help! :-)

    Jason


  • Next message: John Bokma: "Re: XML::Twig parser junk."