[Zope] Zope hiccuping

Chris Kratz chris.kratz@vistashare.com
Wed, 7 Nov 2001 14:33:48 -0500


We have been noticing that periodically, we get an error from IE that says
"Cannot find server or DNS error".  It is not easily reproducable (except by
just clicking links on the server) and a F5 refresh in the browser [almost]
always loads the page correctly.  I turned on logging today with the -M
startup option and observed the following entries when it happened:

B 145697132 2001-11-07T17:15:38 GET /OutcomeTracker/Dev_News
I 145697132 2001-11-07T17:15:38 0
A 145697132 2001-11-07T17:15:39 200 32155
E 145697132 2001-11-07T17:15:39
B 145934020 2001-11-07T17:15:40 GET
/OutcomeTracker/PeopleOrganizations/index_html
I 145934020 2001-11-07T17:15:40 0
B 135053764 2001-11-07T17:15:56 POST /OutcomeTracker/Activities/index_html
I 135053764 2001-11-07T17:15:56 2831
B 146464388 2001-11-07T17:15:57 GET
/OutcomeTracker/PeopleOrganizations/index_html
I 146464388 2001-11-07T17:15:57 0
A 146464388 2001-11-07T17:16:03 200 31200
E 146464388 2001-11-07T17:16:03

Notice how the Get /OutcomeTracker/PeopleOrganizations/index_html never gets
the A or E lines, but only has a B and I line.  The subsequent refresh
finished the request.  Interestingly, the two incompleted requests are not
logged to z2.log.  We can see the request before and the request after, but
that's it.  The other strangeness is that in the postgres log, we see a
"pq_recvbuf: unexpected EOF on client connection".  This seemed to point to
zope threads dying.  Since I'm not getting anything in the logs(*see below),
I started running tests with one eye on the currently running processes.
And sure enough, whenever I got that error at the browser (cannot find
server...), *All* of the zope threads (except the main starter thread) die
quietly and come back with new PIDs.  It really appears like it reruns the
entire startup sequence again.  With Z_DEBUG_MODE on I can watch it go
through the startup sequence again whenever this happens.  But, there are no
tracebacks.  It's just like somebody clicked restart in the middle of a
process.

The one glimmer of hope is in the stupid log file:

2001-11-07T19:30:23 ERROR(200) zdaemon zdaemon: Wed Nov  7 14:30:23 2001:
Aiieee! 1925 exited with error code: 11
...restarting...

Here's the questions,

1. It appears that something is causing those threads to crash (or end), but
nothing is getting put in the log file.  Is there any way to get the
tracebacks I assume are happening or to find out what is going on?
2. Alternatively, is there a way to run zope in single threaded mode?
Z_DEBUG_MODE appears to only apply to the main thread because it goes ahead
and spawns additional threads.  If I use -t 0 I get two processes, but no
response from a web browser request.  If I use -t 1, I get three processes
owned by nobody and the original one by root.
3. Any further ideas on how to debug this thing?  Where do I find what error
code 11 is?

Thanks for you time and help,

-Chris

------------------------------
Chris Kratz
chris.kratz@vistashare.com