[Zope] ZEO troubles on RedHat EL4 Linux

Thu Aug 18 17:16:05 EDT 2005

[Andreas Krasa]
> ...
> As I understood Dieter's mail, this strange behavior is caused by the
> way RedHat Enterprise Linux 4 system libraries handle SIG_IGN/SIGCHLD.

I don't know.  Dieter asked whether you ran the tests via "zopectl
test", but I didn't see an answer to that.  If you run the Zope tests
directly ("python test.py"), then the ZODB/ZEO tests never touch the
OS's default handler for SIGCHLD; if you do use zopectl, zopectl.py
_does_ set its own handler for SIGCHLD.

I'm not sure Dieter's info is current either.  The SIGCHLD handler in
current Zope 2.7.7's zopectl.py explicitly catches and ignores the
specific exception you reported:

def _ignoreSIGCHLD(*unused):
    while 1:
        try: os.waitpid(-1, os.WNOHANG)
        except OSError: break

...

    signal.signal(signal.SIGCHLD, _ignoreSIGCHLD)

But looks like Dieter added that code to begin with, so hard to
believe he forgot about it ;-)

> If this problem was due to some improper Zope methods, most people would
> have this sort of problems. Which is not the case. That makes me believe
> that the failure of ZEO tests actually is caused by some uncommon or
> improper implementation of those two handles - which, in my opinion,
> makes it something RedHat should take a look at.

I don't believe anyone at this point knows why you're seeing this
problem; the best way to make progress is to whittle it down to a
small, small-contained test case.  "Some ZEO tests fail sometimes"
still involves mountains of code, including everything from the OS
kernel to hundreds of .py files.   The ZEO test process setup isn't
anywhere near as complicated as zopectl, or as anything relying on
zdaemon:  the ZEO tests spawn processes directly via Python's
os.spawnve(), and later waits for them to end, via the waitpid() code
shown earlier.  It doesn't muck around with signals, forks, or
anything else that should be platform-dependent (the same ZEO-test
process code is used on both Linux and Windows, BTW -- for this
reason, it can't rely on any fancy signal or process gimmicks;
spawnve+watipid is the entire story here).

> ...
> Anyway - how severe are those testing failures for actually USING a ZEO
> client/server on that particular OS as a production system?

All the failures you showed were in test teardown.  If that's all the
failures you got, then all the test bodies actually passed.  Of course
you have to be wary that normal methods of detecting child-process
termination aren't working as hoped on this box, because all the test
failures you reported were exactly failures to detect child-process
termination.  I don't know how much of that Zope does, but can say
ZODB/ZEO never does that in normal operation (spawning multiple
processes, in ZODB+ZEO, is unique to the testing code; a ZEO server is
a single process, and doesn't spawn other process while it's running).