[Zope-dev] What makes Zope twirl?

Thu, 6 Feb 2003 19:24:37 -0800

Or: when zope goes into a nonresponsive state,
what can you do to diagnose the cause?

The even that prompts this question:

Our production system (2 zeo clients) went down today.
Platform: Linux 2.4, Zope 2.5.1 from source (wo_pcgi), 
Python 2.1.3 from source, running behind Apache
for one site, and a custom java proxy for another site
(don't ask). ZServer is not exposed to anything
except the servers running Apache and the Java proxy.

All the zope processes were still running,
CPU usage was low (almost nil for python), * 
there was plenty of free physical memory & swap. 
Yet Zope was not responding to requests.
A look at the access logs revealed that
zope had not logged anything since the time
we noticed the outage. Nothing
unusual before that except AltaVista crawling our site
(a measly 2 requests / second).

A restart seemed to fix everything, though one
of the zeo servers went down again (same symptoms)
about 20 minutes after starting. Restarted it again
and both servers have been fine for hours now.

This seems to be rare; I haven't seen it before on
this particular server, but I saw a similar wedge
on our dev machine about 3 weeks ago.

I've looked at ALL the logs (access log, zeo log,
zope stdout / stderr log) and found nothing at all
unusual, just the aforementioned AltaVista crawl
and a couple of RAM Cache errors from non-pickleable
objects that I need to dis-associate from the cache.
But none of this is new.

Is it time for "Big M"? WOuld that give me anything
useful?

* this does not sound like other zope "spins"
I have heard of, in which python eats 99% CPU
indefinitely due to (probably) an application error.
see for example:
http://www.zopezen.org/Members/zopista/News_Item.2003-01-28.1025 

-- 

Paul Winkler
http://www.slinkp.com
Look! Up in the sky! It's CAPTAIN STETOSCOPE!
(random hero from isometric.spaceninja.com)