Re: Help improving IO::Socket script response
- From: Marc Bissonnette <dragnet\_@_/internalysis.com>
- Date: Tue, 28 Nov 2006 02:27:35 -0600
"J. Gleixner" <glex_no-spam@xxxxxxxxxxxxxxxxxxxxx> altered the spacetime
fabric by disgorging news:456b756a$0$503$815e3792@xxxxxxxxxxxxxx:
Marc Bissonnette wrote:
Hi all;
I'm not sure where/what I've gone wrong here; The purpose of the code
below is to get the server type (Apache/IIS/etc) from a list of URLs.
In the SQL query (whose results are used in the IO::Socket call), the
data from 'cname' will be a string, like "Foo Internet", the data
from 'mainurl' will be a url like "www.foo.com"
The problem lies in massive time-outs (I don't know what the longest
is/can be: I've killed the process after ten minutes, however); The
only sanity-saving option so far is to run it in batches of ten URLs,
but I've got a list of ~400 to go through.
I've googled for somehow using a timeout with IO::Socket, but all the
responses seem to be that timeout is deprecated and only applies to
the handshake, not the listen/read. I'd be *really* happy if I could
simply get this to work with giving each url 5 seconds to respond
with the first 8 lines to get the Server: header.
I've trimmed my code to get rid of all the HTML output for
legibility's sake; The reason for this little script is merely
curiosity; A thread in can.internet.highspeed was asking what OS the
majority of ISPs in Canada are running, so I thought I'd take a crack
at it.
Many thanks for any provided insights or suggested reading! (I've
gone over the CPAN entries for IO::Socket and IO::Socket::INET
several times, but don't seem to be absorbing what it is that's
causeing the timeouts)
#!/usr/bin/perl
use CGI;
use IO::Socket;
use URI::Escape;
use CGI::Carp qw(fatalsToBrowser);
use DBI;
use strict;
my $row;
my $keys;
my $key;
my $sock;
my %hosts;
my $hosts;
my $host;
my $dbname;
my $dbuser;
my $dbpass;
my @row;
my $foo;
my $line;
my $count;
my $found;
Define them in the smallest scope.
Define();init(); ???
my $query="SELECT `cname`,`mainurl` FROM `ispdata` WHERE `mainurl` IS
NOT NULL AND `mainurl` != \"http://\" AND `approved` = 1 ORDER BY
`cname`";
No need for all that extra noise.
my $query = qq{
select cname, mainurl
from ispdata
where mainurl is not NULL and
mainurl != "http://" and
approved = 1
order by cname
};
my $dbh;
my $sth;
$dbh = DBI->connect("DBI:mysql:$dbname", $dbuser, $dbpass,{PrintError
=> 1, RaiseError => 1});
$sth=$dbh->prepare($query);
$sth->execute();
while (@row=$sth->fetchrow()) {
$row[0]=uri_unescape($row[0]);
$row[1]=uri_unescape($row[1]);
Set these to something that makes sense.... $cname and $mainurl???
$foo='http://';Since you're using the same value, either define
$row[1]=~ s/$foo//g;
$foo outside the while, probably using a better name,
or simply use it in the substitution.
my @url;You could place that within the following if() and if you simply want
the first item from split you don't need it at all.
$row[1] = ( split '/', $row[1] ) [0]; # or $row[1] =~ s{/.*}{};
if ($row[1] =~ /\//) {
@url=split /\//,$row[1];
$row[1]=$url[0];
}
$row[1]=~ s/ //g;
$hosts{$row[0]}=$row[1];
}
$sth->finish;
$dbh->disconnect;
For the rest I'd suggest taking a look at an article written by Randal
Schwartz, especially the validate_links subroutine:
http://www.stonehenge.com/merlyn/UnixReview/col56.html
adding a call to HEAD, which you can parse as needed, you could
replace your entire loop by a slightly modified version of that
subroutine.
Thanks to both you and Xho, I was able to refine the code a bit to make
the timeouts a bit more palateable in that I could watch the output and
remove ISPs from the DB who either weren't in business anymore or didn't
have a webserver up and running (which I am assuming is the same thing :)
)
I just couldn't wrap my head around the LWP::Parallel stuff, at least not
to get a timeout on the read/responses, but I got the desired end result,
though it took me quite a while :)
For those interested in the data collected, it's here, along with a thank
you mention to Xho and J Gleixner:
http://www.canadianisp.com/isp_server_types.html
The quick summary, however: (These are ISPs serving Canada, only)
ISPs queried: 355
ISPs who's servers responded: 322 (resulting in 33 ISPs being removed
from CanadianISP.com)
ISPs who's servers responded, but with no Server: information: 14
Server Responses:Apache 212 66% (We have a winner!)
Microsoft IIS 78 24%
Zeus 5 2%
Netscape Enterprise 2 0.6%
NCSA 1 0.3%
Sun ONE Webserver 1 0.3%
WebSTAR 1 0.3%
Zope 1 0.3%
Thanks again!
--
Marc Bissonnette
Looking for a new ISP? http://www.canadianisp.com
Largest ISP comparison site across Canada.
.
- References:
- Help improving IO::Socket script response
- From: Marc Bissonnette
- Re: Help improving IO::Socket script response
- From: J. Gleixner
- Help improving IO::Socket script response
- Prev by Date: Posting Guidelines for comp.lang.perl.misc ($Revision: 1.7 $)
- Next by Date: Translating RAD-50 code
- Previous by thread: Re: Help improving IO::Socket script response
- Next by thread: Simple regex question
- Index(es):
Relevant Pages
|
|