Re: Zombie handles when trapped by a signal

From: Lincoln A. Baxter (lab_at_lincolnbaxter.com)
Date: 11/25/04

  • Next message: Michael A Chase tech: "Re: prepared statement: automatically removed trailing spaces"
    To: Henri Asseily <henri@shopzilla.com>
    Date: Thu, 25 Nov 2004 12:36:19 -0500
    
    

    Hi Henri,

    I have read the other posts to this thread, and it does sound like (at
    least) a bug in the Sybase driver. Now that I know that the Sybase
    client has an internal timeout feature, I am more suspicious the we are
    running into multiple handlers on the SIGALRM. (one in sybase code).
    Although, quickly scanning the strace output, the could have implemented
    that with select(). (now... I cann't remember if select itself uses
    alarms... though I did not think it did).

    On Wed, 2004-11-24 at 22:36 -0800, Henri Asseily wrote:
    > On Nov 24, 2004, at 10:14 PM, Lincoln A. Baxter wrote:
    >
    > > Hi Henri,
    > >
    > > I have some questions/avenues for you to pursue:
    > >
    > > 1) What happens when you change safe=>1 to safe=>0 in this code?
    >
    > You end up getting the same as the standard $SIG{ALRM} behavior, i.e.
    > the alarm never triggers.
    >

    Hmmm, I think I want to take a closer look at that... I do suspect the
    we are running into an issue with Sybase signal handling in addition to
    other things. But, I want to do a little testing of Sys::SigAction's
    safe flag in this case. Can you construct a script the does this that I
    might be able to try against or DBD-Oracle? (and send me your latest HA
    module .. if it is needed and it not on CPAN).

    > >
    > > 2) What happens if you close the entire dbh at this point (reopen it
    > > later)? -- its a thought?
    >
    > I don't know, but I certainly do not want that (which is why I didn't
    > try it). The concept is to do the execute with a timeout. If the
    > timeout triggers, retry a "select 1". If that fails, then we assume the
    > db is dead and switch to another one. If it succeeds, then either the
    > statement is wrong or the database is overloaded, and I still have to
    > determine the correct course of action. But switching to another
    > database server automatically is not correct.

    Tim recommended cleaning up the entire dbh in another message, and I
    would too, even after the DBD-Sybase bug is fixed. Even with the safe
    flag we have to assume that signals are inherently unsafe. That is how
    we handle all DB timeouts on a Database in code we have written.

    I think that doing a "select 1" after a timeout should really be
    revisited. What are you going to do if that succeeds, do the original
    execute again? What if that hangs again? Are you keeping a counter? If
    so, I think you are headed down the wrong path. I think you should
    immediately give up, and cleanup. All the comments about possible
    corruption (using signals) that Tim made not withstanding, if you have
    timed out a database operation, it is probably because the operation is
    flawed in its design, or because the DB is sick or way too busy.
    Anything else you do (other than closing it) has the potential to make
    it worse (even select 1). Closing the database connection is about the
    only thing safe thing you can do on the client side, that _MIGHT_ make
    it better -- primarily because you would give the DB engine a chance to
    reclaim some resources, and heal itself.

    Lincoln


  • Next message: Michael A Chase tech: "Re: prepared statement: automatically removed trailing spaces"