Re: Question about scoping



On Mar 9, 4:58 pm, "Bob Dubery" <megap...@xxxxxxxxx> wrote:
On Mar 5, 8:57 am, Ben Morrow <b...@xxxxxxxxxxxx> wrote:
I'm amazed this syntax works at all. It appears to be parsed as

eval { (newXML::DOM::Parser)->parse($headerFile) };

or

eval {XML::DOM::Parser->new->parse($headerFile) };

which would be the recommended way to write it. The indirect object
notation ('new Foo' as opposed to 'Foo->new') is considered to have been
a bad idea: it confuses the reader, and under the wrong circumstances it
can confuse Perl.

Well thanks for that. I will try making that change to this program.

<snip>
Since I changed the statement that you're commenting on to something
like
my $bodyXML = eval {newXML::DOM::Parser- >parse($bodyFile)};

and going on to perform a $bodyXML->dispose()

the listener has run more reliably and memory usage is reduced. It is
still creeping up with time, but much more slowly.

So I'm going to try the alternate notation that you offer and see if
that brings a further improvement.

And it DID improve things... but the memory consumed by the program
was still creeping up.

In the end I tried XML::Parser::Expat instead. This means some changes
to the code as XML::Parser::Expat objects cannot be reused and may
only perform ONE parse. But the leak is fixed.

Prior to this I had tried using XML::Parser instead of XML::DOM, but
the problem did not go away.

Is there a memory leak in XML::Parser?


The code for the validation routine now looks like this....


sub validateFile
{
# get the value passed to this subroutine
my $dataString = shift;

# eliminate tabs, line feeds etc...
$dataString =~ tr/\t\n\r\f//d;
# eliminate padding between tags...
$dataString =~ s/>\s*</></g;

# create parser objects
# these are expat parsers - cannot be reused
my $hParser = XML::Parser::Expat->new(ProtocolEncoding => 'UTF-8');
my $bParser = XML::Parser::Expat->new(ProtocolEncoding => 'UTF-8');

# check for and remove the starting charater for the data transfer
if ($dataString =~ /^/) # character (0x02)
{
$dataString =~ s/^//; # character (0x02)

# check for and remove the ending charater for the data transfer
if ($dataString =~ /$/) # character (0x03)
{
$dataString =~ s/$//; # character (0x03)

# make sure the very first part of the file is a valid xml header
with utf-8 encoding included
unless ($dataString =~ /^<\?xml version=\"1\.0\" encoding=
\"UTF-8\"\?>/)
{
$errMsg = "Invalid xml header for the routing header.";
}
else
{
# now check for the second xml header for the body file
unless ($dataString =~ /<\?xml version=\"1\.0\" encoding=
\"UTF-8\"\?>/)
{
$errMsg = "Invalid xml header for the xml body file.";
}
else
{
# separate the xml header and xml body file into 2 different
variables
$dataString =~ s/^(<\?xml version=\"1\.0\" encoding=\"UTF-8\"\?>.
+)(<\?xml version=\"1\.0\" encoding=\"UTF-8\"\?>.+)$/$1$2/;
my $headerFile = $1;
my $bodyFile = $2;

# parse the xml headerFile to make sure that it is a well-formed
xml file
my $headerOK = eval {$hParser->parse($headerFile)};

# parser will have died and eval failed if XML is not well-formed
unless($headerOK)
{
$errMsg = "XML header file is not well-formed and can't be
processed.";
}
else
{
# parse the xml bodyFile to make sure that it is a well-formed
xml file
my $bodyOK = eval {$bParser->parse($bodyFile)};

# parser will have died with error message in $EVAL_ERROR if XML
is not well-formed
unless($bodyOK)
{
$errMsg = "XML body file is not well-formed and can't be
processed.";
}
}
}
}
}
else
{
$errMsg = "No closing flag found for the data.";
}
}
else
{
$errMsg = "No opening flag found for the data.";
}
$hParser->release;
$bParser->release;
}

Now there is one remaining oddity... even though I have called the
release methods on the parsers, the memory consumed by the parsers
only gets released when the subroutine is executed again and the
statements

my $hParser = XML::Parser::Expat->new(ProtocolEncoding => 'UTF-8');
my $bParser = XML::Parser::Expat->new(ProtocolEncoding => 'UTF-8');

are executed.

However, this means that the memory usage will not creep. It goes up
when a large incoming message is received, and may stay up for some
time, but the next incoming message will effectively release the
memory resources by instantiating new Expat parsers.

.



Relevant Pages

  • Re: up2date RPM package conflict error
    ... Group: System Environment/Libraries Source RPM: ... Xerces provides world-class XML parsing and generation. ... The parsers are highly modular and configurable. ... The install times match. ...
    (RedHat)
  • Re: Large XML files
    ... > be as large as 1GB and contain up to 100,000 images. ... now uses xgawk with the XML extension of GNU Awk. ... but probably only SAX-like parsers. ... in memory and are therefore limited by the amount ...
    (comp.text.xml)
  • Re: Anyone getting XML to work??
    ... haven't included SAX, but it sounds like you're trying to use it. ... You lost me with the comment about parsers. ... Parse C++ code or use XML for something? ... > on out and modifiy lots of header files just to get the supplied example ...
    (microsoft.public.windowsce.embedded.vc)
  • Re: [Q] Text vs Binary Files
    ... > file if something like XML and solid parsers weren't available and free. ... > binary file that will never be moved to another OS shouldn't present any ... The number of bits in a machine word is *definitely* going to change. ...
    (comp.programming)
  • Re: How can I ensure that I always have a list?
    ... parser in 10 lines of code" technique might make the job easier. ... I don't understand the XML parsers for TCL. ... head like a satellite. ...
    (comp.lang.tcl)