Zombie handles when trapped by a signal

From: Henri Asseily (henri_at_bizrate.com)
Date: 11/25/04


Date: Wed, 24 Nov 2004 16:52:08 -0800
To: dbi-users@perl.org

I have slammed into a wall in my quest for reliable failover and high
availability in DBI. I don't know if this discussion should be in
dbi-users or dbi-dev, but here goes:

High availability necessitates a good timeout handling system. If
execution of an sql statement or stored procedure takes too long, one
should have the opportunity to kill it and fail over to a less
overloaded server.

One problem is in the timeout handling in Perl (and Unix in general).
The standard $SIG{ALRM} technique utterly fails when trying to trap
$sth->execute(), and never gets triggered.
That problem has now been resolved thanks to Lincoln Baxter's excellent
Sys::SigAction module (at least for Unix machines) which utilizes all
the techniques (POSIX sigaction, SIGALRM...) to ensure proper signal
handling.

But there's another more subtle problem that I only today finally
managed to get to the bottom of:

Assuming you use Sys::SigAction and you properly trap the execute()
call, you get nailed by DBI's aggressive sanity checking.

Suppose you have code like the following (copied from my upcoming
DBIx::HA 0.9x module):

eval {
    my $h = set_sig_handler(
             'ALRM',
             sub { $timeout = 1; die 'TIMEOUT'; },
             { mask=>['ALRM'],
             safe=>1 }
           );
    alarm(10);
    $res = $sth->SUPER::execute;
    alarm(0);
};
alarm(0);

If the alarm is triggered, then your statement handle ($sth) gets
automatically corrupted with no way to get rid of it. This in turn will
continuously add active kids to your database handle and corrupt everything.
Below is the result of triggering the above alarm:

  null: (in cleanup) dbih_setup_fbav: invalid number of fields:
-1, NUM_OF_FIELDS attribute probably not set right at ....

  null: DBI handle 0xabf1038 cleared whilst still active at ...

null: DBI handle 0xabf1038 has uncleared implementors data at ...
     dbih_clearcom (sth 0xabf1038, com 0xaeb79b8, imp DBD::Sybase::st):
        FLAGS 0x180057: COMSET IMPSET Active Warn ChopBlanks PrintWarn
        PARENT DBIx::HA::db=HASH(0xa21e008)
        KIDS 0 (0 Active)
        IMP_DATA undef
        LongReadLen 32768
        NUM_OF_FIELDS -1
        NUM_OF_PARAMS 0

The statement handle was created but was never populated with the
execution results, so it's in a weird half-alive state.
For example, the DBIc_NUM_FIELDS is -1, which makes dbih_setup_fbav()
croak. Similarly, DBIc_ACTIVE is still true.

Should there be an additional field for a handle that tells us if it's
not in a fully active state, and if so then we have carte blanche to
wipe it?
What's the best strategy to deal with these zombies?

I can provide a patch when I dig deeper.

H



Relevant Pages

  • Re: Zombie handles when trapped by a signal
    ... > One problem is in the timeout handling in Perl. ... > continuously add active kids to your database handle and corrupt everything. ... These are not limitations of the DBI. ... > execution results, so it's in a weird half-alive state. ...
    (perl.dbi.users)
  • Re: Zombie handles when trapped by a signal
    ... > I have slammed into a wall in my quest for reliable failover and high ... > availability in DBI. ...
    (perl.dbi.users)
  • Re: DBD::CSV - UPDATE corrupts data!
    ... I managed to update DBI on the Web-Server, where my test-script corrupts data while updating - and still it does not work. ... Even traceshows no differences in the execute-part, as I set in the script, is for the execute-part the same as 15). ... The 2nd execution should update the data. ...
    (perl.dbi.users)
  • Perl and Apache locking
    ... I'm developing simple multiuser application based on execution of the ... with DBI and CGI libraries and connection to Oracle. ... What I found is that when number of users execute the requests at the ...
    (comp.lang.perl.misc)