[Zope] Frequent ZOPE crashes

Andreas Krasa andreas.krasa at wu-wien.ac.at
Mon Nov 30 02:59:22 EST 2009


Hi Tres,

thank you very much for your reply!

Am 29.11.09 21:57, schrieb Tres Seaver:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>>> ----- Original Message ----- From: "Andreas Krasa"
>>> <andreas.krasa at wu-wien.ac.at>
>>>
>> we're right in the process of tracking down the error outside of ZOPE.
>>
>> We have completely installed a new server from scratch with RHEL 5.4 and
>> have re-installed python 2.4.6 and the latest versions of libxml2 and
>> libxslt there. We double checked the LD config, and made sure that te
>> correct shared objects get loaded (via lsof).
>>
>> We also reinstalled a few other modules that contain C-code (such as
>> python-ldap) which we need for being able to do authenitcation.
>>
>> Unfortunately that didn't really help much. We still experience crashes.
>>
>> Are there any known issues with Zope 2.11.2, LibXML2 and/or LibXSLT that
>> could cause these problems?
>>
>> The only thing we re-used is the Data.fs, which we have to, because
>> we're talking about a production system here.
>>
>> Also note, that we have used excatly the same setup for a long time now,
>> even on the same hardware, without any of these troubles. The problems
>> only started when we switched over to a new (and probably more
>> resource-intensive layout).
>>
>> We're unfortunately still not able to reproduce these crashes.
>
> Can you set 'ulimit -c' to get a core file, which might at least help
> point to the extension which is to blame (although it may just show the
> "downstream" victim of a heap munge).
>
> What versions of libxml2 / libxslt are you using?  How about lxml?

Yes, we did set the ulimit and were indeed able to produce a coredump 
for each crash happening (each having something between 300 and 700 MB). 
We tried to debug using "gdb" but unfortunaley they only reveal two 
cases when the crashes occur:

1) During garbage collection where the gc tries to clean up damaged 
python objects
2) During some "ceval" process, also related to accessing damaged python 
objects

Unfortunately it doesn't reveal what exactly trashes the objects. To us 
it seems that this could happen some time earlier before either of the 
two processes mentioned above tries to access the objects and crashes ZOPE.

For now, we don't really see a reproduceable pattern as it seems to be a 
somewhat more complex user behavior which leads to this. We were able to 
extract a few URLs out of the coredumps but directly accessing those 
does nothing. Also the last logged access in the Z2.log before the 
coredump triggers nothing, when directly accessing it.

We're running ZOPE-2.11.2 with an eggified version of ZODB3-3.8.4 plus 
libxml2-2.7.6, libxslt-1.1.26 and lxml-2.2.4 now, the crashes still 
happen. Previously we've been running with ZOPE-2.11.2, libxml2-2.7.3, 
libxslt-1.1.24 and lxml-2.1.5. That also crashed ZOPE occasionally.

This only happened since we switched to a new layout (probably in 
combination with a few minor Silva updates).

We have been using the same system software (RHEL5), hardware, python 
version and libxml2/libxslt/lxml versions with our old old layout, where 
everything worked fine for years.

I would be happy to paste any particular gdb outputs if that is of any 
help...?

Kind regards,
Andreas


More information about the Zope mailing list