Re: Zombie handles when trapped by a signal

From: Henri Asseily (henri_at_shopzilla.com)
Date: 11/25/04


Date: Wed, 24 Nov 2004 22:36:27 -0800
To: lab@lincolnbaxter.com

On Nov 24, 2004, at 10:14 PM, Lincoln A. Baxter wrote:

> Hi Henri,
>
> I have some questions/avenues for you to pursue:
>
> 1) What happens when you change safe=>1 to safe=>0 in this code?

You end up getting the same as the standard $SIG{ALRM} behavior, i.e.
the alarm never triggers.

>
> 2) What happens if you close the entire dbh at this point (reopen it
> later)? -- its a thought?

I don't know, but I certainly do not want that (which is why I didn't
try it). The concept is to do the execute with a timeout. If the
timeout triggers, retry a "select 1". If that fails, then we assume the
db is dead and switch to another one. If it succeeds, then either the
statement is wrong or the database is overloaded, and I still have to
determine the correct course of action. But switching to another
database server automatically is not correct.

>
> 3) Which DBD(s) have you tested this with? If more than one, does the
> problem occur with all DBD's you have tried? (Which ones have you
> tried?) I would want to know if this is DBD behavior or DBI code that
> is freaking out or getting corrupted by the signal. We could be
> dealing
> with stacked Signal handlers. Check the source of the DBD you are
> using
> for the use of signal() or sigaction().

I've been using DBD::Sybase exclusively at this point, with ASE and
Sybase IQ.

>
> 4) Have you considered looking at the output with DBI_TRACE=n Where n
> is
> greater than 0 -- you can turn trace on just before the statement with

Yep that's my next step to do. That will help me determine what to
patch in DBI.

> 5) If you are running on linux, can you do this with strace, and show
> us
> the output? If on Sun... same question but use truss. (Don't recall
> know what does the equiv on AIX or HPUX).

Will do as well.

> On Wed, 2004-11-24 at 16:52 -0800, Henri Asseily wrote:
>> I have slammed into a wall in my quest for reliable failover and high
>> availability in DBI. I don't know if this discussion should be in
>> dbi-users or dbi-dev, but here goes:
>>
>> High availability necessitates a good timeout handling system. If
>> execution of an sql statement or stored procedure takes too long, one
>> should have the opportunity to kill it and fail over to a less
>> overloaded server.
>>
>> One problem is in the timeout handling in Perl (and Unix in general).
>> The standard $SIG{ALRM} technique utterly fails when trying to trap
>> $sth->execute(), and never gets triggered.
>> That problem has now been resolved thanks to Lincoln Baxter's
>> excellent
>> Sys::SigAction module (at least for Unix machines) which utilizes all
>> the techniques (POSIX sigaction, SIGALRM...) to ensure proper signal
>> handling.
>>
>> But there's another more subtle problem that I only today finally
>> managed to get to the bottom of:
>>
>> Assuming you use Sys::SigAction and you properly trap the execute()
>> call, you get nailed by DBI's aggressive sanity checking.
>>
>> Suppose you have code like the following (copied from my upcoming
>> DBIx::HA 0.9x module):
>>
>> eval {
>> my $h = set_sig_handler(
>> 'ALRM',
>> sub { $timeout = 1; die 'TIMEOUT'; },
>> { mask=>['ALRM'],
>> safe=>1 }
>> );
>> alarm(10);
>> $res = $sth->SUPER::execute;
>> alarm(0);
>> };
>> alarm(0);
>>
>>
>> If the alarm is triggered, then your statement handle ($sth) gets
>> automatically corrupted with no way to get rid of it. This in turn
>> will
>> continuously add active kids to your database handle and corrupt
>> everything.
>> Below is the result of triggering the above alarm:
>>
>> null: (in cleanup) dbih_setup_fbav: invalid number of fields:
>> -1, NUM_OF_FIELDS attribute probably not set right at ....
>>
>> null: DBI handle 0xabf1038 cleared whilst still active at ...
>>
>> null: DBI handle 0xabf1038 has uncleared implementors data at ...
>> dbih_clearcom (sth 0xabf1038, com 0xaeb79b8, imp
>> DBD::Sybase::st):
>> FLAGS 0x180057: COMSET IMPSET Active Warn ChopBlanks PrintWarn
>> PARENT DBIx::HA::db=HASH(0xa21e008)
>> KIDS 0 (0 Active)
>> IMP_DATA undef
>> LongReadLen 32768
>> NUM_OF_FIELDS -1
>> NUM_OF_PARAMS 0
>>
>>
>> The statement handle was created but was never populated with the
>> execution results, so it's in a weird half-alive state.
>> For example, the DBIc_NUM_FIELDS is -1, which makes dbih_setup_fbav()
>> croak. Similarly, DBIc_ACTIVE is still true.
>>
>> Should there be an additional field for a handle that tells us if it's
>> not in a fully active state, and if so then we have carte blanche to
>> wipe it?
>> What's the best strategy to deal with these zombies?
>>
>> I can provide a patch when I dig deeper.
>>
>> H
> --



Relevant Pages

  • Re: Has anyone noticed?
    ... Gawd, Taysseer never triggers until ... Ahmed justifys the atomic profession mercilessly. ... It's very standard today, I'll object by no means or Ayman will ...
    (sci.crypt)
  • Re: Yukon and "BEFORE" triggers??
    ... The Oracle functionality is not part of the SQL standard, ... > Another useful effect would be the ability to throw an exception in the ... >> INSTEAD OF triggers are the way to do that, ...
    (microsoft.public.sqlserver.programming)
  • Re: FBI XL 2T Disarm/arm remotely
    ... the event that it is triggers the alarm when lightning strike or kids ... What do you mean disarm remotely? ... lightning I need to disarm and then arm it again. ...
    (alt.security.alarms)
  • Re: Enforcing functional dependecy constraints
    ... > If it's elementary, and there's a standard answer, and the standard answer ... I saw something about functional dependecies in the SQL standard. ... >> c)Rwith some triggers ... Some DBMS implement the declarative constraints with triggers anyway :-) ...
    (comp.databases.theory)
  • Re: FBI XL 2T Disarm/arm remotely
    ... you have to disarm it and arm it again?...it's not gonna disarm itself you ... And what does your alarm say to you when it calls? ... |>> the event that it is triggers the alarm when lightning strike or kids ...
    (alt.security.alarms)