[ZODB-Dev] ZODB Deadlock issue python frame traces.

John D. Heintz jheintz@isogen.com
Fri, 11 May 2001 10:40:02 -0500


Sorry for how brief my previous email was, let me give you more details and 
fill in the gaps.  :-)


On Thursday 10 May 2001 15:40, Jeremy Hylton wrote:
> John,
>
> I'm not sure I understand what the problem is just from the stack
> trace.  Apparently, you've got a deadlock problem.  Which thread is
> blocked?  From the stack trace, it looks like one thread is trying to
> acquire the FileStorage "synchronized method" lock in a load() call.
> The other thread is attempting to acquire the DB object's method
> lock.

Both threads are blocked - in fact they are deadlocked wrt each other.
One thread owns the db lock and is waiting on the storage lock and the other 
thread is the other way around.

>
> The two locks use different implementations.  The FileStorage lock
> is a reentrant lock implemented by the ThreadLock module from
> ExtensionClass.  The DB lock is *either* a lock from the Python
> thread module or a no-op lock from bpthread.  One thing that
> confuses me about the stack trace is that it shows line 101 of
> bpthread, but my copy of bpthread only has 99 lines!  If bpthread
> is showing up anywhere, however, it suggests that there is a
> serious problem with the Python install.  bpthread only defines an
> allocate_lock() class if it can't import allocate_lock from
> thread.  What's gone wrong there?

Ug, sorry.  I modified bpthread to wap thread.allocate_lock() with a local 
class so that I can store and examine the thread id of the acquiring thread.

Here is the relevant snippet, but I really think that we can just ignore this 
code:

...
try:
    from thread import allocate_lock as thread_allocate_lock
    from thread import get_ident
    from thread import start_new_thread
    class allocate_lock:
        def __init__(self):
            self.lock = thread_allocate_lock()
            self.owner = None
            
        def acquire(self, waitflag=1):
            self.lock.acquire(waitflag)  # This is line 101
            self.owner = get_ident()
            
        def release(self):
            self.owner = None
            self.lock.release()
...

>
> Quite apart from my confusion about the bpthread module: Which thread
> is blocked?  Is it the FileStorage lock that is already held or is it
> the DB lock?  In either case, are there other threads in the system?

Both threads are blocked, as well as several others that are waiting on 
either one of these locks as well.  I think that the stack traces that I've 
explicitly list show the deadlock code path and that the other threads are 
just victims of this problem.

> The FileStorage lock should be reentrant, so another thread must hold
> it for that thread to be block.  But you haven't shown any other
> thread executing in FileStorage!  And FileStorage is pretty careful
> about using try-finally to guarantee that the lock is released.  (But
> maybe there's a bug there.)

Thread id 6151 has gone through code BaseStorage:228 before getting to 
DB:323. This means that thread 6151 must hold the storage lock and is waiting 
for the db lock.  Thread id 8201 has passed through DB:501 and is waiting on 
the storage lock.  

>
> Or are you showing the lock owners and not the place where the
> deadlock occurs?

I'm trying to show both.

>
> Jeremy
>
> PS I often find it helpful to debug locking problems by printing out a
> message whenever a lock is acquired or released.  Of course, the
> introduction of extra I/O can make the program behave differently,
> but, still, it has helped more often than not.  I don't seem to have
> any code on my work machine, but I recall writing a little wrapper
> using Python 2.1's sys._getframe() function that made it very easy.
> The wrapper can store the real lock as an attribute and implement
> acquire() and release() by calling them on its lock attribute.  The
> sys._getframe() can be used to print the location of the call.

Oh, I'm familiar with this code ;-)  In fact, I have created a patch for 
Python 2.1 that provides sys._getframes() function that returns a dict that 
maps thread id to frame for all PyThreadStates in the interpreter.

I've attached a more complete session log.  This is from a different 
execution of the program and therefore the deadlocked threads have different 
ids but the end result is the same.

About the only thing I can think of that can be causing this is that we are 
calling Connection.sync() and not just DB.open()ing a new Connection.

John

More complete session trace:
<thor._objref_ThorServer instance at 0x83a024c>
>>> import sys
>>> from pprint import pprint
>>> pprint(sys._getframes())
{1024: <frame object at 0x8292ba0>,
 3076: None,
 5126: <frame object at 0x87aaf98>,
 156679: <frame object at 0x83bdfc0>,
 159752: <frame object at 0x847fe50>,
 163850: <frame object at 0x87f5bb8>,
 165897: <frame object at 0x82afa70>,
 177163: <frame object at 0x87abdc8>}
>>> pprint(sys._getframes())
{1024: <frame object at 0x8292ba0>,
 3076: None,
 5126: <frame object at 0x87aaf98>,
 156679: <frame object at 0x83bdfc0>,
 159752: <frame object at 0x847fe50>,
 163850: <frame object at 0x87f5bb8>,
 165897: <frame object at 0x82afa70>,
 177163: <frame object at 0x87abdc8>}
>>> frames = _
>>> def print_line(f):
...     if f:
...             print "%s:%s" % (f.f_code.co_filename, f.f_lineno)
...
>>> def print_stack(f):
...     while f != None:
...             print_line(f)
...             f = f.f_back
...
>>> for k, f in frames.items():
...     print k
...     print print_line(f)
...
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: items
>>> frames
<function translate at 0x82604bc>
>>> frames = sys._getframes()
>>> frames
{177163: <frame object at 0x87abdc8>, 163850: <frame object at 0x87f5bb8>, 
165897: <frame object at 0x82afa70>, 159752: <frame object at 0x847fe50>, 
156679: <frame object at 0x83bdfc0>, 5126: <frame object at 0x87aaf98>, 3076: 
None, 1024: <frame object at 0x87abf88>}
>>> for k, f in frames.items():
...     print k
...     print_line(f)
...
177163
/home/jheintz/lib/python/ZODB/bpthread.py:101
163850
/home/jheintz/lib/python/ZODB/FileStorage.py:588
165897
/home/jheintz/lib/python/ZODB/FileStorage.py:588
159752
/home/jheintz/lib/python/ZODB/bpthread.py:101
156679
/home/jheintz/lib/python/ZODB/bpthread.py:101
5126
/home/jheintz/lib/python/ZODB/FileStorage.py:588
3076
1024
<stdin>:1
>>> dbFrame = frames[177163].f_back
>>> db = dbFrame.f_locals['self']
>>> db
<ZODB.DB.DB instance at 0x83ac79c>
>>> dbLock = db._a.__self__
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: __self__
>>> dir(db._a)
['__dict__', '__doc__', '__name__', 'im_class', 'im_func', 'im_self']
>>> dbLock = db._a.im_self
>>> dbLock
<ZODB.bpthread.allocate_lock instance at 0x83ac7c4>
>>> dir(dbLock)
['lock', 'owner']
>>> dbLock.owner
5126
>>> # Finally got the thread id that owns the db lock
...
>>>
>>> storFrame = frames[5126]
>>> stor = storFrame.f_locals['self']
>>> dir(stor)
['__name__', '_commit_lock_acquire', '_commit_lock_release', '_file', 
'_file_name', '_index', '_index_get', '_lock_acquire', '_lock_file', 
'_lock_release', '_nextpos', '_oid', '_pos', '_quota', '_serial', '_tfile', 
'_thl', '_tindex', '_transaction', '_ts', '_tstatus', '_tvindex', '_ude', 
'_vindex', '_vindex_get']
>>> storLock = stor._lock_acquire.__self__
>>> storLock
<ThreadLock object at 0x8116be0>
>>> storLock.count
0
>>> storLock.id
156679
>>> # Yeah, got the thread id of the owner of the Storage lock
...
>>> print_stack(frames[156679])
/home/jheintz/lib/python/ZODB/bpthread.py:101
/home/jheintz/lib/python/ZODB/DB.py:323
/home/jheintz/lib/python/ZODB/Connection.py:678
/home/jheintz/lib/python/ZODB/BaseStorage.py:228
/home/jheintz/lib/python/ZODB/Connection.py:671
/home/jheintz/lib/python/ZODB/Transaction.py:325
corbaFrameworks/ZODBCorbaFramework.py:262
thorCorbaServer.py:95
corbaFrameworks/ZODBCorbaFramework.py:245
thorCorbaServer.py:68
>>> print_stack(frames[5126])
/home/jheintz/lib/python/ZODB/FileStorage.py:588
/home/jheintz/lib/python/ZODB/Connection.py:520
/home/jheintz/lib/python/ZODB/Connection.py:224
/home/jheintz/lib/python/ZODB/DB.py:501
thorCorbaServer.py:68
>>>

>
>
> _______________________________________________
> For more information about ZODB, see the ZODB Wiki:
> http://www.zope.org/Wikis/ZODB/
>
> ZODB-Dev mailing list  -  ZODB-Dev@zope.org
> http://lists.zope.org/mailman/listinfo/zodb-dev

-- 
. . . . . . . . . . . . . . . . . . . . . . . .

John D. Heintz | Senior Engineer

1016 La Posada Dr. | Suite 240 | Austin TX 78752
T 512.633.1198 | jheintz@isogen.com

w w w . d a t a c h a n n e l . c o m