Parsing a tab delimited log file

From: John (jpoz_at_quickscribble.com)
Date: 10/29/03


To: php-general@lists.php.net
Date: Wed, 29 Oct 2003 15:12:37 -0600


(I'm reposting this as it didn't seem to take the first time. If it's a
dupe, I apologize.)

Hey all. I'm attempting to count the number of unique message IDs from a
Microsoft Exchange 2000 tracking log. While the code below works, it takes
forever to run. Running through a 16mb log file takes like 10 minutes.

If anyone could suggest a better way of doing this, that would be great.
Also, below the PHP code, is an example of an Exchange tracking log (Which
is tab delimited).

Thanks in advance for the help.

John

Here is the code I'm using:

<?
ini_set("max_execution_time", 1200);
$filename="20031025.log";

$file=file($filename, "r");

$count=count($file);

$array=array();

//first 6 lines of the log are header and empty lines
$i=5;
while ($i<=$count)
{

$line=explode("\t",$file[$i]);

//the MSGID is always the ninth entry in the array
$data=$line[9];

//Exchange puts a blank line between each entry in the log file, this
creates an empty entry in the array
if (!empty($data))
{
    if (!in_array ($data, $array))
    $array[]=$data;
}

$i++;
}

$arraycount=count($array);
echo $arraycount,"<BR>";
?>

Here is an example of the Exchange tracking logs:

2003-10-26 0:0:35 GMT 10.0.0.1 file-server -
Exchange-Server 10.0.0.50 user@domain.com 1019
Exchange-Server0123456789@Exchange-Server.domain.com 0 0 799
1 2003-10-26 0:0:35 GMT 0 Version: 5.0.2195.5329 - -
recipient@domain.com -

2003-10-26 0:0:35 GMT 10.0.0.1 file-server -
Exchange-Server 10.0.0.50 user@domain.com 1025
Exchange-Server0123456789@Exchange-Server.domain.com 0 0 799
1 2003-10-26 0:0:35 GMT 0 Version: 5.0.2195.5329 - -
recipient@domain.com -

2003-10-26 10:13:43 GMT - - - Exchange-Server -
recipient@yahoo.com 1020
uniquemessageIDgeneratedbyexchange001002003@Exchange-Server.domain.com 0
0 952 1 2003-10-26 10:13:42 GMT 0 - - -
sender@domain.com -

2003-10-26 10:13:43 GMT - - SMTP-Server
Exchange-Server - recipient@yahoo.com 1031
uniquemessageIDgeneratedbyexchange001002003@Exchange-Server.domain.com 0
0 952 1 2003-10-26 10:13:42 GMT 0 - - -
sender@domain.com -

2003-10-26 0:45:59 GMT 192.168.1.2 SMTP-Server.domain.com -
Exchange-Server 10.0.0.50 user@domain.com 1019
Exchange-Server009800500200@Exchange-Server.domain.com 0 0 15188
1 2003-10-26 0:45:59 GMT 0 Version:
95.5329 - - postmaster@domain.com -

2003-10-26 0:45:59 GMT 192.168.1.2 SMTP-Server.domain.com -
Exchange-Server 10.0.0.50 user@domain.com 1025
Exchange-Server009800500200@Exchange-Server.domain.com 0 0 15188
1 2003-10-26 0:45:59 GMT 0 Version:
95.5329 - - postmaster@domain.com -