Re: Deadlock resolution

From: Dmitry A. Kazakov (mailbox_at_dmitry-kazakov.de)
Date: 07/28/04


Date: Wed, 28 Jul 2004 16:21:08 +0200

On Wed, 28 Jul 2004 14:53:02 +0100, Nick Roberts wrote:

> My current design for the AdaOS kernel is as follows. The kernel
> counts the time that each thread (task) is blocked waiting for
> another thread (in the same workstation). If the count reaches a
> certain threshold (possibly about 10 minutes) for a thread, the
> kernel performs a traceback on the chain of threads it is waiting
> on; this chain breaks if any thread in it is waiting on I/O or a
> timer. If the chain doesn't break before the original thread is
> reached again, deadlock is detected.

Don't you have any protected objects a task might wait for? (except for
timer and I/O events, you have mentioned) So the only synchronization
mechanism is rendezvous with tasks?

> The current 'resolution' strategy is to select the youngest (most
> recently created) thread in the chain, and kill it (it gets a
> special Deadlock exception).

How does it differ from timed entry call from the perspective of the task
being killed? Except for waiting for a timer, I don't see much difference.
If so, then one can just claim that all calls are time bounded (say by 10
minutes).

> There are at least two weaknesses with this approach: (a) it takes
> at least 10 minutes (or whatever) to detect a deadlock; (b) any
> chain of deadlock that goes outside the workstation at all will
> not be detected by this mechanism.
>
> I think (a) is unlikely to be a serious problem, in practice.
>
> However, since AdaOS will be a fully distributed OS, (b) is very
> likely to happen, in practice. I believe that the kernel canot be
> expected to manage deadlocks in these cases, because it would be
> impractical to implement a mechanism that could be guaranteed
> immune to false intervention (to act only in cases of a genuine
> deadlock).

Theoretically you could navigate the chain of tasks across the whole
network... (:-))

> I therefore believe that super-kernel software which might be in
> communication with software executing on another workstation
> (and in AdaOS, that means most software) must be programmed to
> either: (1) eliminate (within reason) the possibility of
> deadlock; (2) detect and resolve potential deadlocks. I think
> (1) will be impractical in the majority of AdaOS programs.
>
> When considering the likely AdaOS (actually CORBA) scenario of
> a menagerie of programs interacting in potentially deadlocking
> ways, I think considerations of deadlock management gain in
> importance. So, I'm pondering about the subject as it impacts
> the design of the AdaOS kernel (which is called 'Bachar').

It would be interesting to speculate whether some kind of profile (like
Ravenscar) plus a richer set of protected object operations could eliminate
deadlocks being enough powerful for an universal OS. For example, famous
philosophers wouldn't deadlock if two protected objects (forks) could share
one protected action. Just a guess.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de


Relevant Pages

  • Re: Am I using ThreadPool the right way?
    ... It is possible to debug deadlocks and other threading issues in the Express version, but it's not something I'd recommend for someone unfamiliar with the general techniques of dealing with thread issues in the first place, since Express doesn't provide any direct way to get at the individual threads in the debugger. ... You'll be looking for threads that are stopped on a statement that waits for some resource, to identify which threads are involved in the deadlock and why they are waiting. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: [PATCH 13/12] ksm: fix munlock during exit_mmap deadlock
    ... just reintroduce the OOM deadlock that 9/12 was about solving. ... And it wouldn't be exactly the same deadlock, ... and hangs there waiting to acquire ksm_thread_mutex. ... If mm_users is allowed to go to 0, it's up to ksm ...
    (Linux-Kernel)
  • Re: Critical section ?
    ... James wrote: ... A deadlock is caused by thread A waiting for thread B to do something ... first is to send one thread to sleep on return, ...
    (alt.comp.lang.learn.c-cpp)
  • Re: Locking
    ... When this happens sql server automatically detects it and will ... So if your app is slow and waiting it is not ... > At the same time another stored proc is inserting data into the table,> updated fields, and deleting records. ... A deadlock> occurs, an the SQL box begins to get extremely slow. ...
    (microsoft.public.sqlserver.programming)
  • Re: Deadlock resolution
    ... but deadlock handling is a bit like ... My current design for the AdaOS kernel is as follows. ... kernel performs a traceback on the chain of threads it is waiting ...
    (comp.lang.ada)