Question about scoping



Hi all,

Some extracts from a program that I recently worked on follow.

The program is a "listener" that waits for data coming in on a
specific port. The incoming messages will contain 2 XML records, each
of which must be validated.

This program was exhibiting the characteristics of a memory leak. When
started up it would consume a certain amount of memory, but over time
the memory in use would grow and grow.

Here's the code I found (not the whole program) then I'll show my
changes and ask the actual question...

<code snippets start>
#! /usr/bin/perl

use strict;
use IO::Socket;
use XML::DOM;
use English;
use DBI;
use lib "/usr/local/PostOffice/progs/modules";
use QueuesDatabase;

.......
.......
my $line; # will contain all the input received
my $data; # will contain the current input

# get the input from the client
my $bytesRead = sysread($new_sock, $data, 2048);

# the data received can be more than 2048 characters, so we need to
keep reading if we havn't received the
# end of transmission character
while ($bytesRead)
{
$line = $line . $data;

# if it is the end of transmission, set the variable to 0 to exit
the while loop
if ($line =~ /$/)
{
$bytesRead = 0;
}
# otherwise keep reading from the socket
else
{
$bytesRead = sysread($new_sock, $data, 2048);
}
}

# make sure all the conditions are met
validateFile($line);

......
......


sub validateFile
{
# get the value passed to this subroutine
my $dataString = shift;

# remove the newline character and all other formatting characters
chomp $dataString;
$dataString =~ s/\t//g;
$dataString =~ s/\n//g;
$dataString =~ s/\r//g;
$dataString =~ s/\f//g;

# check for and remove the starting charater for the data transfer
if ($dataString =~ /^/) # character (0x02)
{
$dataString =~ s/^//; # character (0x02)

# check for and remove the ending charater for the data transfer
if ($dataString =~ /$/) # character (0x03)
{
$dataString =~ s/$//; # character (0x03)

# make sure the very first part of the file is a valid xml
header with utf-8 encoding included
unless ($dataString =~ /^<\?xml version=\"1\.0\" encoding=
\"UTF-8\"\?>/)
{
$errMsg = "Invalid xml header for the routing header.";
}
else
{
# now check for the second xml header for the body file
unless ($dataString =~ /<\?xml version=\"1\.0\" encoding=
\"UTF-8\"\?>/)
{
$errMsg = "Invalid xml header for the xml body file.";
}
else
{
# separate the xml header and xml body file into 2 different
variables
$dataString =~ s/^(<\?xml version=\"1\.0\" encoding=
\"UTF-8\"\?>.+)(<\?xml version=\"1\.0\" encoding=\"UTF-8\"\>.+)$/
$1$2/;
my $headerFile = $1;
my $bodyFile = $2;

# parse the xml headerFile to make sure that it is a well-
formed xml file
eval {new XML::DOM::Parser->parse($headerFile)}; # create
a new parser object and load the input file

# parser will have died with error message in $EVAL_ERROR if
XML is not well-formed
if($EVAL_ERROR)
{
$errMsg = "XML header file is not well-formed and can't be
processed.";
}
else
{
# parse the xml bodyFile to make sure that it is a well-
formed xml file
eval {new XML::DOM::Parser->parse($bodyFile)}; # create
a new parser object and load the input file

# parser will have died with error message in $EVAL_ERROR
if XML is not well-formed
if($EVAL_ERROR)
{
$errMsg = "XML body file is not well-formed and can't be
processed.";
}
}
}
}
}
else
{
$errMsg = "No closing flag found for the data.";
}
}
else
{
$errMsg = "No opening flag found for the data.";
}
}

<code snippets end>

So the program uses "strict" and 'my' is used in all the subroutines.

Now I made the following change....

<modified snippet starts>
{
# separate the xml header and xml body file into 2 different
variables
$dataString =~ s/^(<\?xml version=\"1\.0\" encoding=
\"UTF-8\"\?>.+)(<\?xml version=\"1\.0\" encoding=\"UTF-8\"\?>.+)$/
$1$2/;
my $headerFile = $1;
my $bodyFile = $2;

# parse the xml headerFile to make sure that it is a well-
formed xml file
my $headerXML = eval {new XML::DOM::Parser-
parse($headerFile)}; # create a new parser object and load the
input file

# parser will have died with error message in $EVAL_ERROR if
XML is not well-formed
if($EVAL_ERROR)
{
$errMsg = "XML header file is not well-formed and can't be
processed.";
}
else
{
# parse the xml bodyFile to make sure that it is a well-
formed xml file
my $bodyXML = eval {new XML::DOM::Parser-
parse($bodyFile)}; # create a new parser object and load the input
file

# parser will have died with error message in $EVAL_ERROR
if XML is not well-formed
if($EVAL_ERROR)
{
$errMsg = "XML body file is not well-formed and can't be
processed.";
}
else
{
$bodyXML->dispose;
}
$headerXML->dispose;
}
}
<modified snippet ends>

So what I did was to open the XML::DOM::Parser objects to variable
names and then call the dispose method once the XML has been parsed to
see if it's well formed. Result, memory usage - except for the brief
period when a document is being parsed - is more or less constant.

So my question is about the scoping in the original code. Within a
subroutine there is a line like
eval {new XML::DOM::Parser->parse($headerFile)};

How is that parser object scoped? I imagine that the author of the
code expected the object to dissappear out of memory once the
subroutine was entered.

So why the constant growth in memory usage? I can think of only two
possibilities....

1) eval {new XML::DOM::Parser->parse($headerFile)}; results in
somethong that is globally scoped

2) XML::DOM::Parser creates a whole lot of other objects/variables in
memory that persist even when the actual XML::DOM::Parser object
passes out of scope.

Or is there another reason?

Thanks

Bob

.



Relevant Pages

  • Re: [Full-disclosure] [scip_Advisory 1746] Microsoft Internet Explorer 6.0 embedded content cross si
    ... I discovered something similar recently, though, where a *valid* jpg ... containing an XML header was issued to IE (via a direct link to the ...
    (Full-Disclosure)
  • Re: Question about scoping
    ... the listener has run more reliably and memory usage is reduced. ... # make sure the very first part of the file is a valid xml header ... release methods on the parsers, the memory consumed by the parsers ...
    (comp.lang.perl.misc)
  • Re: Problem with solicit-response adapter BTS 2004
    ... My mistake was the encoding of the ... > xml response file. ... > of the response was not specified. ... > So try UTF8 or try to add a XML header specifying the encoding. ...
    (microsoft.public.biztalk.general)
  • Re: SuSE 10.1, java 1.4.2 und Zeichensatzprobleme
    ... die Encoding Deklaration im XML Header nicht ignoriert wird. ... Imho macht das auch 1.4.2, ... dazu auch schon, dass der eingebundene XML Parser ausgetauscht ist, oder evtl. ...
    (de.comp.lang.java)
  • Re: Word 2003 XML
    ... In essence the parser determines whether the XML provided to it, ... have much info. about the schemas associated with an XML document. ... (which is what namespaces are all about). ...
    (microsoft.public.mac.office.word)