Re: xml parsing script dying with "Premature end of script headers" error
- From: GazK <invalid@xxxxxxxxxxxxxxx>
- Date: Wed, 29 Oct 2008 20:34:16 +0000
GazK wrote:
Curtis wrote:GazK wrote:I have been using an xml parsing script to parse a number of rss feeds and return relevant results to a database. The script has worked well for a couple of years, despite having very crude error-trapping (if it finds an error in one of the xml files, the script stops). Recently, the script has stopped working because one of the xml files is badly formed.
So I decided to rewrite the script with better error trapping; the script should continue with the well-formed xml files and send me an email telling me what happened.
The prototype script is failing with a "Premature end of script headers" error. I am trying to work out if:
- this is a problem with my script, or
- a problem with the web server configuration
I have been over the code with as close as I have to a fine toothcomb, and I can't see anything which would cause a problem.
Here is my code:
<?php
[snipped some function and variable declarations]
##### from here onwards the script has been rewritten
# initialise feed counter
$count = 0;
$passed = TRUE;
$body = "RSS parse results:\n";
foreach ($feedsource as $feed) {
# loop through each RSS file in turn
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if(fopen("$feed", "r")) {
# if file can be opened
$fp = fopen("$feed", "r");
First of all, this is not a good way to test if fopen() succeeded.
if ($fp = fopen($feed, 'r')) {
...
}
Also, doing "$var" is a bad habit, because you may run into some unexpected typing troubles:
var_dump("$int"); // string
var_dump($int); // integer
$body .= "Success opening " . $feed . "\n";
while ($data = fread($fp, 4096)) {
# loop through feed contents
Here's where your problem probably lies. You should not parse your RSS data until you're finished collecting all the data. What happens when your RSS data exceeds the buffer? The answer is that the while-statement will start another iteration to get more data, continuing in this manner until EOF is reached. This will cause xml_parse() and other xml functions to attempt to operate on the incomplete RSS feed.
Instead, use file_get_contents(), and eliminate the loop entirely.
Here's a small scale example of what *might* be happening to you, with your current approach:
<?php
$rss = <<<EORSS
<?xml version="1.0"?>
<rss version="2.0">
<foo>
<bar>baz</bar>
</foo>
</rss>
EORSS;
$file = 'fread_test.rss';
$bufferTooSmall = ceil(strlen($rss) / 2);
// write the data - error checking removed for brevity
file_put_contents($file, $rss);
if ($fp = fopen($file, 'r')) {
$i = 1;
while ($data = fread($fp, $bufferTooSmall)) {
echo "Iteration $i:\n$data\n\n";
$i++;
}
fclose($fp);
}
?>
if(xml_parse($xml_parser, $data, feof($fp))) {
# success
$body .= "Success parsing " . $feed . "\n";
} else {
# fail
$body .= "Failed to parse " . $feed . ": XML error " . xml_error_string(xml_get_error_code($xml_parser)) . " at line " . xml_get_current_line_number($xml_parser) . "\n";
$passed = FALSE;
}
}
} else {
# failed to open file
$body .= "Failed to open " . $feed . "\n";
$passed = FALSE;
}
# close file
fclose($fp);
# free up xml parser
xml_parser_free($xml_parser);
}
if($passed) {
# if no errors
$passText = "no errors";
} else {
$passText = "ERRORS";
}
$subject = "Newsfeed report: " . $passText . " at " . date("d-m-Y G:i");
$to = "invalid@xxxxxxxxxxx";
mail($to, $subject, $body);
?>
Curtis, thanks for the assistance. I will give the file_get_contents() approach a go - it looks much simpler in any case.
Garry
Update - script is now morking much more reliably. Old script has been binned. Thanks!
.
- References:
- Prev by Date: Re: IE6 strange behaviour with fwrite calls
- Next by Date: MySQL error #1064
- Previous by thread: Re: xml parsing script dying with "Premature end of script headers" error
- Next by thread: CENTOS, Apache & PHP
- Index(es):
Relevant Pages
|