Re: Help improving IO::Socket script response



"J. Gleixner" <glex_no-spam@xxxxxxxxxxxxxxxxxxxxx> altered the spacetime
fabric by disgorging news:456b756a$0$503$815e3792@xxxxxxxxxxxxxx:

Marc Bissonnette wrote:
Hi all;

I'm not sure where/what I've gone wrong here; The purpose of the code
below is to get the server type (Apache/IIS/etc) from a list of URLs.
In the SQL query (whose results are used in the IO::Socket call), the
data from 'cname' will be a string, like "Foo Internet", the data
from 'mainurl' will be a url like "www.foo.com"

The problem lies in massive time-outs (I don't know what the longest
is/can be: I've killed the process after ten minutes, however); The
only sanity-saving option so far is to run it in batches of ten URLs,
but I've got a list of ~400 to go through.

I've googled for somehow using a timeout with IO::Socket, but all the
responses seem to be that timeout is deprecated and only applies to
the handshake, not the listen/read. I'd be *really* happy if I could
simply get this to work with giving each url 5 seconds to respond
with the first 8 lines to get the Server: header.

I've trimmed my code to get rid of all the HTML output for
legibility's sake; The reason for this little script is merely
curiosity; A thread in can.internet.highspeed was asking what OS the
majority of ISPs in Canada are running, so I thought I'd take a crack
at it.

Many thanks for any provided insights or suggested reading! (I've
gone over the CPAN entries for IO::Socket and IO::Socket::INET
several times, but don't seem to be absorbing what it is that's
causeing the timeouts)

#!/usr/bin/perl
use CGI;
use IO::Socket;
use URI::Escape;
use CGI::Carp qw(fatalsToBrowser);
use DBI;
use strict;

my $row;
my $keys;
my $key;
my $sock;
my %hosts;
my $hosts;
my $host;
my $dbname;
my $dbuser;
my $dbpass;
my @row;
my $foo;
my $line;
my $count;
my $found;

Define them in the smallest scope.

Define();
init(); ???


my $query="SELECT `cname`,`mainurl` FROM `ispdata` WHERE `mainurl` IS
NOT NULL AND `mainurl` != \"http://\"; AND `approved` = 1 ORDER BY
`cname`";

No need for all that extra noise.

my $query = qq{
select cname, mainurl
from ispdata
where mainurl is not NULL and
mainurl != "http://"; and
approved = 1
order by cname
};
my $dbh;
my $sth;
$dbh = DBI->connect("DBI:mysql:$dbname", $dbuser, $dbpass,{PrintError
=> 1, RaiseError => 1});
$sth=$dbh->prepare($query);
$sth->execute();
while (@row=$sth->fetchrow()) {
$row[0]=uri_unescape($row[0]);
$row[1]=uri_unescape($row[1]);

Set these to something that makes sense.... $cname and $mainurl???

$foo='http://';
$row[1]=~ s/$foo//g;
Since you're using the same value, either define
$foo outside the while, probably using a better name,
or simply use it in the substitution.

my @url;
You could place that within the following if() and if you simply want
the first item from split you don't need it at all.
$row[1] = ( split '/', $row[1] ) [0]; # or $row[1] =~ s{/.*}{};
if ($row[1] =~ /\//) {
@url=split /\//,$row[1];
$row[1]=$url[0];
}
$row[1]=~ s/ //g;
$hosts{$row[0]}=$row[1];
}
$sth->finish;
$dbh->disconnect;

For the rest I'd suggest taking a look at an article written by Randal
Schwartz, especially the validate_links subroutine:

http://www.stonehenge.com/merlyn/UnixReview/col56.html

adding a call to HEAD, which you can parse as needed, you could
replace your entire loop by a slightly modified version of that
subroutine.


Thanks to both you and Xho, I was able to refine the code a bit to make
the timeouts a bit more palateable in that I could watch the output and
remove ISPs from the DB who either weren't in business anymore or didn't
have a webserver up and running (which I am assuming is the same thing :)
)

I just couldn't wrap my head around the LWP::Parallel stuff, at least not
to get a timeout on the read/responses, but I got the desired end result,
though it took me quite a while :)

For those interested in the data collected, it's here, along with a thank
you mention to Xho and J Gleixner:

http://www.canadianisp.com/isp_server_types.html

The quick summary, however: (These are ISPs serving Canada, only)

ISPs queried: 355
ISPs who's servers responded: 322 (resulting in 33 ISPs being removed
from CanadianISP.com)
ISPs who's servers responded, but with no Server: information: 14
Server Responses:Apache 212 66% (We have a winner!)
Microsoft IIS 78 24%
Zeus 5 2%
Netscape Enterprise 2 0.6%
NCSA 1 0.3%
Sun ONE Webserver 1 0.3%
WebSTAR 1 0.3%
Zope 1 0.3%

Thanks again!

--
Marc Bissonnette
Looking for a new ISP? http://www.canadianisp.com
Largest ISP comparison site across Canada.
.



Relevant Pages

  • Re: Is this an easy fix or complicated one?
    ... that box that opens, properties will be listed..if it isn't working, post ... >> within the required timeout. ... The server %1 did not register with DCOM within the required ... >> If the problem continues to occur, contact the program vendor. ...
    (microsoft.public.windowsxp.help_and_support)
  • Re: RWW timing out
    ... > RWW site as: ... > - Remote Web Workplace timeout ... Restart the IIS Service or restart the SBS Server. ...
    (microsoft.public.windows.server.sbs)
  • RE: RWW timing out
    ... Thank you for posting to the SBS Newsgroup. ... - Remote Web Workplace timeout ... Restart the IIS Service or restart the SBS Server. ...
    (microsoft.public.windows.server.sbs)
  • Re: SQL/ADO (2.8) Timeout Error - Cant resolve
    ... > timeout limit. ... > server, or the code itself or various other issues. ... You can either use Network Monitor to get a sniff of network ... the stored procedure code, then launch the sp from the Access database ...
    (microsoft.public.sqlserver.clients)
  • Re: SMTP Connector / Smart Host: I Need to get a smart host.
    ... > Verizon does require authentication. ... > settings, so I'm 99.999% sure it's accurate. ... > they want me relaying mail from my client's server through their server. ... Most ISPs don't do this. ...
    (microsoft.public.windows.server.sbs)