Re: Help improving IO::Socket script response



Marc Bissonnette wrote:
Hi all;

I'm not sure where/what I've gone wrong here; The purpose of the code below is to get the server type (Apache/IIS/etc) from a list of URLs.
In the SQL query (whose results are used in the IO::Socket call), the data from 'cname' will be a string, like "Foo Internet", the data from 'mainurl' will be a url like "www.foo.com"

The problem lies in massive time-outs (I don't know what the longest is/can be: I've killed the process after ten minutes, however); The only sanity-saving option so far is to run it in batches of ten URLs, but I've got a list of ~400 to go through.

I've googled for somehow using a timeout with IO::Socket, but all the responses seem to be that timeout is deprecated and only applies to the handshake, not the listen/read. I'd be *really* happy if I could simply get this to work with giving each url 5 seconds to respond with the first 8 lines to get the Server: header.

I've trimmed my code to get rid of all the HTML output for legibility's sake; The reason for this little script is merely curiosity; A thread in can.internet.highspeed was asking what OS the majority of ISPs in Canada are running, so I thought I'd take a crack at it.

Many thanks for any provided insights or suggested reading! (I've gone over the CPAN entries for IO::Socket and IO::Socket::INET several times, but don't seem to be absorbing what it is that's causeing the timeouts)

#!/usr/bin/perl
use CGI;
use IO::Socket;
use URI::Escape;
use CGI::Carp qw(fatalsToBrowser);
use DBI;
use strict;

my $row;
my $keys;
my $key;
my $sock;
my %hosts;
my $hosts;
my $host;
my $dbname;
my $dbuser;
my $dbpass;
my @row;
my $foo;
my $line;
my $count;
my $found;

Define them in the smallest scope.

Define();
init(); ???


my $query="SELECT `cname`,`mainurl` FROM `ispdata` WHERE `mainurl` IS NOT NULL AND `mainurl` != \"http://\"; AND `approved` = 1 ORDER BY `cname`";

No need for all that extra noise.

my $query = qq{
select cname, mainurl
from ispdata
where mainurl is not NULL and
mainurl != "http://"; and
approved = 1
order by cname
};
my $dbh;
my $sth;
$dbh = DBI->connect("DBI:mysql:$dbname", $dbuser, $dbpass,{PrintError => 1, RaiseError => 1});
$sth=$dbh->prepare($query);
$sth->execute();
while (@row=$sth->fetchrow()) {
$row[0]=uri_unescape($row[0]);
$row[1]=uri_unescape($row[1]);

Set these to something that makes sense.... $cname and $mainurl???

$foo='http://';
$row[1]=~ s/$foo//g;
Since you're using the same value, either define
$foo outside the while, probably using a better name,
or simply use it in the substitution.

my @url;
You could place that within the following if() and if you simply want the first item from split you don't need it at all.
$row[1] = ( split '/', $row[1] ) [0]; # or $row[1] =~ s{/.*}{};
if ($row[1] =~ /\//) {
@url=split /\//,$row[1];
$row[1]=$url[0];
}
$row[1]=~ s/ //g;
$hosts{$row[0]}=$row[1];
}
$sth->finish;
$dbh->disconnect;

For the rest I'd suggest taking a look at an article written by Randal Schwartz, especially the validate_links subroutine:

http://www.stonehenge.com/merlyn/UnixReview/col56.html

adding a call to HEAD, which you can parse as needed, you could replace your entire loop by a slightly modified version of that subroutine.
.