Re: parsing to XML



<steeve_dun@xxxxxxxxxxxx> wrote in comp.lang.perl.misc:
> Hi everybody,
> I have a document that includes definitions.
> What I want is parsing the document and saving these definitions in a
> xml document.
> Is there a simple way to do so?
> Thank you!
>
> Example:
> #### beginning of ducument ####
> \glossary{HTML} {HyperText Markup Language} is the lingua franca for
> publishing hypertext on the \glossary {WWW}{World Wide Web}
> #### end of ducument ####

Your example doesn't show the variability of the data. Examples never
do, they only ever give a lower bound. There can always be a variant
that doesn't happen to appear in the example.

Can a "definition" span lines? Assuming that it can, you can't process
the text line-wise without major trickery. You'll need all of it in
memory . Here is a method that extracts the definitions from the text
and puts them in a hash:

my $text = <<'END_TEXT';
\\glossary{HTML} {HyperText Markup Language} is the lingua franca for
publishing hypertext on the \\glossary {WWW}{World Wide Web}
END_TEXT

my %definition_for = $text =~ /\\glossary\s*{([^}]*)}\s*{([^}]*)}/g;

Generating XML from the hash is probably a job for one of the XML modules.

Anno
--
If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.
.



Relevant Pages

  • Re: parsing to XML
    ... Anno Siegel wrote: ... > Generating XML from the hash is probably a job for one of the XML modules. ... > the broken "Reply" link at the bottom of the article. ...
    (comp.lang.perl.misc)
  • parsing to XML
    ... xml document. ... beginning of ducument ... beginning of xml output ...
    (comp.lang.perl.misc)
  • XML parse - hash of hashes
    ... I am fairly new to Perl and to XML and I’m trying to ... update specified data using a hash of hashes parsed ... deeply nested anonymous hashes. ...
    (perl.beginners)
  • Canonicalize and signing in C#
    ... I'm having a problem canonicalizing an xml document and getting a hash ... Canonicalize the XML Document ... canonical.xml which is identical to the one pasted below bar the IRmark ...
    (microsoft.public.dotnet.csharp.general)
  • Canonicalize and signing in C#
    ... I'm having a problem canonicalizing an xml document and getting a hash ... Canonicalize the XML Document ... canonical.xml which is identical to the one pasted below bar the IRmark ...
    (microsoft.public.dotnet.xml)