Zombie handles when trapped by a signal
From: Henri Asseily (henri_at_bizrate.com)
Date: 11/25/04
- Next message: Michael A Chase tech: "Re: prepared statement: automatically removed trailing spaces"
- Previous message: David N Murray: "RE: Why wont my script finish?"
- Next in thread: Lincoln A. Baxter: "Re: Zombie handles when trapped by a signal"
- Reply: Lincoln A. Baxter: "Re: Zombie handles when trapped by a signal"
- Reply: Michael Peppler: "Re: Zombie handles when trapped by a signal"
- Reply: Tim Bunce: "Re: Zombie handles when trapped by a signal"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Wed, 24 Nov 2004 16:52:08 -0800 To: dbi-users@perl.org
I have slammed into a wall in my quest for reliable failover and high
availability in DBI. I don't know if this discussion should be in
dbi-users or dbi-dev, but here goes:
High availability necessitates a good timeout handling system. If
execution of an sql statement or stored procedure takes too long, one
should have the opportunity to kill it and fail over to a less
overloaded server.
One problem is in the timeout handling in Perl (and Unix in general).
The standard $SIG{ALRM} technique utterly fails when trying to trap
$sth->execute(), and never gets triggered.
That problem has now been resolved thanks to Lincoln Baxter's excellent
Sys::SigAction module (at least for Unix machines) which utilizes all
the techniques (POSIX sigaction, SIGALRM...) to ensure proper signal
handling.
But there's another more subtle problem that I only today finally
managed to get to the bottom of:
Assuming you use Sys::SigAction and you properly trap the execute()
call, you get nailed by DBI's aggressive sanity checking.
Suppose you have code like the following (copied from my upcoming
DBIx::HA 0.9x module):
eval {
my $h = set_sig_handler(
'ALRM',
sub { $timeout = 1; die 'TIMEOUT'; },
{ mask=>['ALRM'],
safe=>1 }
);
alarm(10);
$res = $sth->SUPER::execute;
alarm(0);
};
alarm(0);
If the alarm is triggered, then your statement handle ($sth) gets
automatically corrupted with no way to get rid of it. This in turn will
continuously add active kids to your database handle and corrupt everything.
Below is the result of triggering the above alarm:
null: (in cleanup) dbih_setup_fbav: invalid number of fields:
-1, NUM_OF_FIELDS attribute probably not set right at ....
null: DBI handle 0xabf1038 cleared whilst still active at ...
null: DBI handle 0xabf1038 has uncleared implementors data at ...
dbih_clearcom (sth 0xabf1038, com 0xaeb79b8, imp DBD::Sybase::st):
FLAGS 0x180057: COMSET IMPSET Active Warn ChopBlanks PrintWarn
PARENT DBIx::HA::db=HASH(0xa21e008)
KIDS 0 (0 Active)
IMP_DATA undef
LongReadLen 32768
NUM_OF_FIELDS -1
NUM_OF_PARAMS 0
The statement handle was created but was never populated with the
execution results, so it's in a weird half-alive state.
For example, the DBIc_NUM_FIELDS is -1, which makes dbih_setup_fbav()
croak. Similarly, DBIc_ACTIVE is still true.
Should there be an additional field for a handle that tells us if it's
not in a fully active state, and if so then we have carte blanche to
wipe it?
What's the best strategy to deal with these zombies?
I can provide a patch when I dig deeper.
H
- Next message: Michael A Chase tech: "Re: prepared statement: automatically removed trailing spaces"
- Previous message: David N Murray: "RE: Why wont my script finish?"
- Next in thread: Lincoln A. Baxter: "Re: Zombie handles when trapped by a signal"
- Reply: Lincoln A. Baxter: "Re: Zombie handles when trapped by a signal"
- Reply: Michael Peppler: "Re: Zombie handles when trapped by a signal"
- Reply: Tim Bunce: "Re: Zombie handles when trapped by a signal"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|