[Zope] EAGAIN errors crashing ZServer Aiieeee!!!!

Jon Prettyman jprettyman@acm.org
14 Mar 2000 09:29:44 -0800


I'm pretty sure now that async_loop is not exiting in my child process
so I'm guessing it is some other type of abort.  I'll try the gdb
avenue.  I'm pretty handy with that, but I've never done much
debugging python or threaded apps so....

I also grepped around the code and looked over every 11, sys.exit and
SystemExit I could find.  This is a real pain.

-Jon

Michel Pelletier <michel@digicool.com> writes:

> Jon Prettyman wrote:
> > 
> > I've been reading through code trying to figure out what is going on
> > here, where this message might be coming from.  My current train of
> > thought is that the 11 exit code being seen is in z2.py is a result of
> > sys.ZServerExitCode getting set somewhere and z2.py exiting with that
> > code.
> > 
> > So I've been trying to find where code sets sys.ZServerExitCode and
> > what I've found is in ZServer.HTTPResponse.ChannelPipe.close.  In this
> > routine, the value of self._shutdown is assigned to r which then gets
> > assigned to sys.ZServerExitCode.
> > 
> > It looks like self._shutdown only gets assigned when
> > ZServer.HTTPResponse.ChannelPipe.finish gets called and a response
> > header contains an bobo-exception-type of exceptions.SystemExit.
> > 
> > So I'm guessing now that somewhere this exception is getting set but I
> > can't seem to figure out why.
> 
> There is only one reason why I can think of, clicking on the Shutdown
> button. 
> 
> A quick grep shows sys.exit being called in a few places however, most
> specificly in xmllib and pyexpat.  None of these calls set a value of 11
> though.
> 
> Are you using any XML?  XML-RPC calls?
> 
> > Am I completely off base here?
> 
> Perhaps, it's a good avenue to look down however.  What I suspect is
> that something is happening (like a SIGSEGV) that is causing the OS to
> send a signal to the process whose default action is to kill the
> process, setting the error code to 11 for some as-yet mysterious reason
> (11 has *got* to be the clue however, I refuse to believe that it is
> arbitrary, note also that SIGSEGV is signal 11, coincidence?).  A good
> exercise may be to run Zope in gdb and wait, when one of these events
> happens use gdb to inspect what's going on.  I'm no gdb expert however,
> but I was under the idea that it could tell you when signals arrive (or
> perhaps stop the process on the arival of a signal...)
> 
> I wrote a test script that forked a parent and child just like Zope. 
> When I sent the child a SIGSEGV it returned an error code of 332.  Maybe
> my experiment is flawed.  
> 
> Have you enough time to look that deeply?
> 
> -Michel
> > 
> > -Jon
> > 
> > Michel Pelletier <michel@digicool.com> writes:
> > 
> > > Hmm, a simple test script seems to indicate that sending a child process
> > > a signal 11 does not cause it to dump core.  Of course, it also does not
> > > seem to cause it to return a status code of 11 either.  This might not
> > > be a SIGSEGV, it might just be a coincidence that the return code of the
> > > (crashed) child is 11.  (FYI, there are two processing going on here, a
> > > 'watcher' parent and a child, the parent prints the 'Aieee!' when the
> > > child dies.
> > >
> > > Does anyone out there know what a python program returning 11 means?
> > >
> > > -Michel
> > >
> > > Jon Prettyman wrote:
> > > >
> > > > Nope.  No core file.
> > > >
> > > > Aieeeee!!!!
> > > > -Jon