[Zope] EAGAIN errors crashing ZServer Aiieeee!!!!

Michel Pelletier michel@digicool.com
Tue, 14 Mar 2000 09:19:17 -0800


Jon Prettyman wrote:
> 
> I've been reading through code trying to figure out what is going on
> here, where this message might be coming from.  My current train of
> thought is that the 11 exit code being seen is in z2.py is a result of
> sys.ZServerExitCode getting set somewhere and z2.py exiting with that
> code.
> 
> So I've been trying to find where code sets sys.ZServerExitCode and
> what I've found is in ZServer.HTTPResponse.ChannelPipe.close.  In this
> routine, the value of self._shutdown is assigned to r which then gets
> assigned to sys.ZServerExitCode.
> 
> It looks like self._shutdown only gets assigned when
> ZServer.HTTPResponse.ChannelPipe.finish gets called and a response
> header contains an bobo-exception-type of exceptions.SystemExit.
> 
> So I'm guessing now that somewhere this exception is getting set but I
> can't seem to figure out why.

There is only one reason why I can think of, clicking on the Shutdown
button. 

A quick grep shows sys.exit being called in a few places however, most
specificly in xmllib and pyexpat.  None of these calls set a value of 11
though.

Are you using any XML?  XML-RPC calls?

> Am I completely off base here?

Perhaps, it's a good avenue to look down however.  What I suspect is
that something is happening (like a SIGSEGV) that is causing the OS to
send a signal to the process whose default action is to kill the
process, setting the error code to 11 for some as-yet mysterious reason
(11 has *got* to be the clue however, I refuse to believe that it is
arbitrary, note also that SIGSEGV is signal 11, coincidence?).  A good
exercise may be to run Zope in gdb and wait, when one of these events
happens use gdb to inspect what's going on.  I'm no gdb expert however,
but I was under the idea that it could tell you when signals arrive (or
perhaps stop the process on the arival of a signal...)

I wrote a test script that forked a parent and child just like Zope. 
When I sent the child a SIGSEGV it returned an error code of 332.  Maybe
my experiment is flawed.  

Have you enough time to look that deeply?

-Michel
> 
> -Jon
> 
> Michel Pelletier <michel@digicool.com> writes:
> 
> > Hmm, a simple test script seems to indicate that sending a child process
> > a signal 11 does not cause it to dump core.  Of course, it also does not
> > seem to cause it to return a status code of 11 either.  This might not
> > be a SIGSEGV, it might just be a coincidence that the return code of the
> > (crashed) child is 11.  (FYI, there are two processing going on here, a
> > 'watcher' parent and a child, the parent prints the 'Aieee!' when the
> > child dies.
> >
> > Does anyone out there know what a python program returning 11 means?
> >
> > -Michel
> >
> > Jon Prettyman wrote:
> > >
> > > Nope.  No core file.
> > >
> > > Aieeeee!!!!
> > > -Jon
> > >
> > > > > Okay, this is what I'm getting consistently when my server crashes
> > > > > under moderate load:
> > > > >
> > > > > ------
> > > > > 2000-03-13T18:53:02 INFO(0) GUF Successful authentication for user April (http://207.241.10.50/premium/acl_users)
> > > > > ------
> > > > > 2000-03-13T19:01:40 INFO(0) GUF Successful authentication for user thomas (http://www.nationalmortgagenews.com/premium/acl_users)
> > > > > ------
> > > > > 2000-03-13T19:02:24 INFO(0) GUF Successful authentication for user terrydpeters (http://207.241.10.50/premium/acl_users)
> > > > > ------
> > > > > 2000-03-13T19:03:36 ERROR(200) zdaemon zdaemon: Mon Mar 13 13:03:36 2000: Aiieee! 17065 exited with error code: 11
> > > > > ------
> > > > > 2000-03-13T19:03:36 INFO(0) zdaemon zdaemon: Mon Mar 13 13:03:36 2000: Houston, we have forked
> > > > > ------
> > > > > 2000-03-13T19:03:36 INFO(0) zdaemon zdaemon: Mon Mar 13 13:03:36 2000: Hi, I just forked off a kid: 17125
> > > > > ------
> > > > > 2000-03-13T19:03:36 INFO(0) zdaemon zdaemon: Mon Mar 13 13:03:36 2000: Houston, we have forked
> > > > > ------
> > > > > 2000-03-13T19:03:58 PROBLEM(100) ZServer Cannot do reverse lookup
> > > > > ------
> > > > > 2000-03-13T19:03:58 INFO(0) ZServer Medusa (V1.13) started at Mon Mar 13 13:03:58 2000
> > > > >         Hostname: 207.241.10.50
> > > > >         Port:80
> > > > >
> > > > > ------
> > > > > 2000-03-13T19:03:58 INFO(0) ZServer FTP server started at Mon Mar 13 13:03:58 2000
> > > > >         Authorizer:None
> > > > >         Hostname: magmar
> > > > >         Port: 8021
> > > > >
> > > > > From the debug log:
> > > > > B 144485000 2000-03-13T19:03:02 GET /nmn/images/marketplacebutton.gif
> > > > > I 144485000 2000-03-13T19:03:02 0
> > > > > A 144485000 2000-03-13T19:03:02 304 182
> > > > > E 144485000 2000-03-13T19:03:02
> > > > > B 144484496 2000-03-13T19:03:09 GET /id.htm
> > > > > I 144484496 2000-03-13T19:03:09 0
> > > > > A 144484496 2000-03-13T19:03:09 200 3740
> > > > > E 144484496 2000-03-13T19:03:09
> > > > > B 143155832 2000-03-13T19:03:10 GET /newsubsid.gif
> > > > > I 143155832 2000-03-13T19:03:10 0
> > > > > A 143155832 2000-03-13T19:03:10 200 45513
> > > > > E 143155832 2000-03-13T19:03:10
> > > > > B 145239016 2000-03-13T19:03:14 GET /nmn/images/marketplacebutton.gif
> > > > > I 145239016 2000-03-13T19:03:14 0
> > > > > A 145239016 2000-03-13T19:03:14 304 163
> > > > > E 145239016 2000-03-13T19:03:14
> > > > > B 146772968 2000-03-13T19:03:32 GET /nmn/images/afstitle.gif
> > > > > I 146772968 2000-03-13T19:03:32 0
> > > > > A 146772968 2000-03-13T19:03:32 304 182
> > > > > E 146772968 2000-03-13T19:03:32
> > > > > B 139605816 2000-03-13T19:03:59 GET /nmn/images/marketplacebutton.gif
> > > > > I 139605816 2000-03-13T19:03:59 0
> > > > > A 139605816 2000-03-13T19:04:01 304 163
> > > > > E 139605816 2000-03-13T19:04:01
> > > > > B 139494360 2000-03-13T19:04:07 POST /premium/acl_users/register
> > > > > I 139494360 2000-03-13T19:04:07 518
> > > > > A 139494360 2000-03-13T19:04:10 200 8704
> > > > > E 139494360 2000-03-13T19:04:10
> > > > > B 139787784 2000-03-13T19:04:28 GET /nmn/images/marketplacebutton.gif
> > > > > I 139787784 2000-03-13T19:04:28 0
> > > > > A 139787784 2000-03-13T19:04:28 304 182
> > > > > E 139787784 2000-03-13T19:04:28
> > > > >
> > > > > I'm running Linux 2.2.12-20, Zope 2.1.4, currently with only ZServer
> > > > > (and FTP server.)  Things get even flakier when I am running with
> > > > > PCGI.
> > > > >
> >
> > _______________________________________________
> > Zope maillist  -  Zope@zope.org
> > http://lists.zope.org/mailman/listinfo/zope
> > **   No cross posts or HTML encoding!  **
> > (Related lists -
> >  http://lists.zope.org/mailman/listinfo/zope-announce
> >  http://lists.zope.org/mailman/listinfo/zope-dev )
> 
> _______________________________________________
> Zope maillist  -  Zope@zope.org
> http://lists.zope.org/mailman/listinfo/zope
> **   No cross posts or HTML encoding!  **
> (Related lists -
>  http://lists.zope.org/mailman/listinfo/zope-announce
>  http://lists.zope.org/mailman/listinfo/zope-dev )