DB Fault Tolerance - network connections



Hi,

Perl v5.8, Redhat Ent. Linux rel3 (taroon upd5), PostgreSQL 8.1.3, DBI
1.52, DBD:Pg 1.49

We have a batch submission system where users fire off jobs handled by
a Perl server which talks to a PostgreSQL database on a different phys.
system. The jobs may run several hours, with constant communication
back and forth. We would like to code in some (better/configurable)
fault tolerance into the system to handle specific types of problems.
I am currently looking at the problem where a server begins processing,
issuing SQL to Postgress, and then the network suddenly goes down. I
am attempting to simulate this (in a small way) with a simple program
that gets a database handle, then pauses for 20 seconds to allow me to
block the port on the DB server, then attempts to issue $sth->execute.
My test seems to work fine. However, what I'm seeing is that it takes
approximately 16 minutes before the execute() times out & returns
control. I would like to lower this timeout so that I can take various
actions sooner, but I can't seem to find a setting for this timeout in
DBI or in the DBD:Pg driver docs ... I'm hoping it's obvious & I'm just
missing it..? Note, if I block the port prior to running my test
program (ie before I have a database handle), it times out in about 2
minutes .... I'm assuming this stuff can be adjusted somewhere on the
DBI/DBD:PG side (as opposed to some OS setting, etc). Also, fyi, if I
unblock the port at some point during the execute() it will finish
successfully, so I think my blocking/unblocking is working ok. Oh,
sorry, correction...I'm saying execute() but really it's
selectall_arrayref() in this example.

Note that I'm using iptables on the box that is running the Postgresql
database in order to block the incoming port. My DB is running on port
5488. Following is my test program (which is running on a separate
system from 'zing', which is the hostname of the system the DB is on),
then my iptables rule, then a couple of run outputs so you can see the
time difference. Any ideas on adjusting this timeout, alternate test
methods, etc. are most welcome. Thanks, Dylan. Here's the info...
TESTPROG (run on my app system):
#!/usr/bin/perl
use DBI;
my $user = "myuser";
my $pw = "mypw";
my $dbh, $res;
my %attr = (
PrintError => 1,
RaiseError => 0
);

$dbh = DBI->connect("dbi:Pg:dbname=mydb1;host=zing;port=5488", $user,
$pw, \%attr);
exit if ! $dbh;

print "Pausing for port block.....20secs\n";
sleep(20); print "Unpausing\n";
$res = $dbh->selectall_arrayref("select name from mytesttable");
print "We are past the selectall_arrayref command\n";

COMMAND TO ADD IP BLOCK (run on DB system):
sudo iptables -A INPUT -j DROP -p tcp --destination-port 5488

COMMAND TO UNBLOCK THE IP (assuming u only have 1 iptable rule):
sudo iptables -D INPUT 1

Sample run #1 - I just let it run, no blocked ports:
date; testprog.pl; date
Thu Nov 9 21:05:13 CST 2006
Pausing for port block.....20secs
Unpausing
We are past the selectall_arrayref command
Thu Nov 9 21:05:34 CST 2006

Sample run #2 - I block the port on the other sys during the sleep 20
(dbh exists):
date; testprog.pl; date
Thu Nov 9 21:06:02 CST 2006
Pausing for port block.....20secs
Unpausing
DBD::Pg::db selectall_arrayref failed: could not receive data from
server: Connection timed out
We are past the selectall_arrayref command
Thu Nov 9 21:21:57 CST 2006


Sample run #3 - I block the port on the other syste prior to running
the test prog:
date; testprog.pl; date
Thu Nov 9 21:00:47 CST 2006
DBI connect('dbname=mydb;host=zing;port=5488','myuser',...) failed:
could not connect to server: Connection timed out
Is the server running on host "zing" and accepting
TCP/IP connections on port 5488?
at ./testprog.pl line 12
Thu Nov 9 21:03:57 CST 2006

.



Relevant Pages

  • Re: SQL2005: Cannot connect error 11001
    ... user mapped to one database. ... Does the issue has to do with the login account / user ... Server connection. ... if you changed the port ...
    (microsoft.public.sqlserver.connect)
  • RE: Some technical errors
    ... If the SMTP server is not running on port 25 TCP it is not a public ... Manager - Computer Assurance Services BDO Chartered Accountants & ...
    (Security-Basics)
  • Re: SRV RRs support in Internet Explorer?
    ... The port number could be implicit (i.e. ... At any point in time, a server could fail ... can't effectively LB or backup because NSs cache the records for the TTL ... I still don't see how SRV records would help backup or LB. ...
    (microsoft.public.win2000.dns)
  • Re: Created on Access 2003, but.......................
    ... But that's not secure under any scenario, as any port scanner ... Well, you still need a userid, password and database name. ... You're assuming the server remains in a secured configuration. ...
    (comp.databases.ms-access)
  • Re: Still cant connect to RWW or OWA remotely
    ... I get 'cannot find server or dns error' on both ... TCP [port number]> to open the ports. ... As for error messages when I fail to access RWW with the laptop, ... network, no connection seems possible. ...
    (microsoft.public.windows.server.sbs)