Re: Regular Expression for XML Parsing



On 27 Dec, 20:59, tushar.sax...@xxxxxxxxx wrote:
Hi,

I have a set of XML files from which I need to extract some data. The
format of the file is as follows :

<tag1>
<tag3>DATA1</tag3>
</tag1>

<tag2>
<tag3>DATA2</tag3>
</tag2>

I need to extract the DATA part of the xml structure

Note : tag3 can be contained either within tag1 or tag2, but I need to
extract data only from tag1. i.e. DATA1 should be extracted, but not
DATA2

If I want to get both DATA1 and DATA2 I can use a simple regex like :

if (($_ =~ /<tag3>(\w+)<\/tag3>/g))
{
print $1

}

But if I try to get only DATA1 (embedded within tag1) I try using
something like this, but am unable to get it to work

if (($_ =~ /<tag1>[\n\s\S\w\W]*<tag2>(\w+)<\/tag2>[\n\s\S\w\W]*<\/
tag1>/g))
{
print $1

}

In this second case, the match itself fails.

Any help would be appreciated !

$/ = "";

while (<>) {
if ( m{<tag1>.*?<tag3>(\w+)</tag3>.*?</tag1>}gs )
{
print "$1\n";
}
}
.



Relevant Pages

  • Regular Expression for XML Parsing
    ... I have a set of XML files from which I need to extract some data. ... extract data only from tag1. ... If I want to get both DATA1 and DATA2 I can use a simple regex like: ...
    (comp.lang.perl.misc)
  • Bulk insert into Database
    ... Parse XML files & extract relevant information. ... Write the extracted information to a database. ...
    (microsoft.public.dotnet.framework.adonet)
  • Extracting value from data within an element.
    ... I have written a perl script that will parse XML files and extract ... certain elements and put them into a comma delimited file. ...
    (comp.lang.perl.modules)
  • Re: CAB files
    ... But there is no xml files in the cabs i have extract. ... Have you any Cab Explorer? ... Tells you the LMPSQL~1.001 will be copied to the install directory as ...
    (microsoft.public.dotnet.framework.compactframework)
  • Re: cant open iWork Pages doc
    ... me to open a Pages document which won't open anymore ... All I need to do is extract the text from it -- no formatting ... Exactly what is the report/complaint it say when it "wont' open it"? ... They're just XML files, aren't they? ...
    (comp.sys.mac.apps)