RE: gc assertion failure

From: Tim Peters (tim.one_at_comcast.net)
Date: 10/30/03


Date: Wed, 29 Oct 2003 22:48:53 -0500
To: <python-list@python.org>


[Todd Miller]
>>>> python: Modules/gcmodule.c:231: visit_decref: Assertion
>>>> `gc->gc.gc_refs != 0' failed. Abort (core dumped)

>>>> ...
>>>> #5 0x080e9222 in visit_decref (op=0x405adc74, data=0x0) at
>>>> Modules/gcmodule.c:231
>>>> #6 0x0808cebf in tupletraverse(o=0x40a62f74, visit=0x80e9194
>>>> <visit_decref>, arg=0x0) at Objects/tupleobject.c:398

> FWIW, here's what my bug looked like:
>
> < key = Py_BuildValue("(NNsNN)", _digest(in1), _digest(out),
> cumop, thread_id, type
> ---
> > key = Py_BuildValue("(NNsNO)", _digest(in1), _digest(out),
> cumop, thread_id, type
>
> Since I used "N" for type in the Py_BuildValue, it stole a reference
> to type which it shouldn't have. Switching to "O" made the
> Py_BuildValue reference count neutral for type and the problem was
> solved.

That sure fits the pattern, but I'm baffled as to whether to call this one a
missing incref or an excess decref <0.8 wink>.

> Thanks for the help,

Oh, I've seen this assert fail many times in development code. It's worth
running a debug-build Python just to get it, since it's the only check in
the code that *can* catch a too-small refcount before the refcount falls to
0 and the object vanishes (after which point it can be very hard even to
figure out what type the object had). Unfortunately, it can still be
excruciating to track down the cause, because the complaint from gc comes at
seemingly random times, and the code that failed to incref (or erroneously
decrefed) has nothing to do with what's on the call stack at the time the
assert triggers.

Next time, set the gc threshold to 1. This is much slower, because gc
triggers on every container allocation then. The good part is that it
generally finds the too-small refcount much closer to the time an object
grows an unaccounted-for reference. Then, with some luck, you only have to
reconstruct what happened in the preceding 5 million measly little machine
cycles <wink>.

brains-don't-scale-ly y'rs - tim



Relevant Pages

  • Re: [rfc: patch 2/6] rcuref APIs
    ... Adds a set of primitives to do reference counting for objects ... that are looked up without locks using RCU. ... * rcuref_inc - increment refcount for object. ... * @rcuref: reference counter in the object in question. ...
    (Linux-Kernel)
  • Re: [PATCH 0/6] files: rcuref APIs
    ... Adds a set of primitives to do reference counting for objects ... that are looked up without locks using RCU. ... * rcuref_inc - increment refcount for object. ... * @rcuref: reference counter in the object in question. ...
    (Linux-Kernel)
  • Re: [OT] Re: [RFC] atomic refcounting alternative for boost shared_ptr...
    ... I have a version of refcount that supports process-wide atomic reference counting. ... If a process commits some shared memory, and wants to only decommit it when its no longer being referenced by 'anything', it can registered it with my server process via. ... Can anybody else think of practical ways to make use of process-wide atomic reference counting in any of their "existing" applications? ...
    (comp.programming.threads)
  • Re: [OT] Re: [RFC] atomic refcounting alternative for boost shared_ptr...
    ... algorithms that are async safe, apart from patent issues of course? ... The only time you can safely use my refcount algorithm in a signal-handlers context, for now, is when a thread has "already grabbed an initial reference to a shared data-structure". ... it kind of breaks down to sticking to "shared_ptr semantics" in a signal-handler; reference must be owned prior to any new ones being acquired... ...
    (comp.programming.threads)