[Zope] Can't stop Zope, machine hanging

Dieter Maurer dieter at handshake.de
Tue Sep 5 13:55:41 EDT 2006


Ken Ara wrote at 2006-9-5 07:47 -0700:
>...
>Of immediate concern to me is whether I can do
>anything to prevent this happening again. From time to
>time, my Zope hangs, usually because of an attack by a
>bad robot requesting lots of complex pages and sending
>no-cache headers. Then I am able to restart Zope and
>all is well. For a while, when these attacks were
>frequent, I had a crontab to zopectl restart every
>hour. 

There are solutions (I think "daemontools", but may be wrong)
that can automate this more intelligently than a cronjob.

We have our own check server which polls Zope and if it does
not respond in time restarts it.


>But this event was different and I would like to know
>if anyone thinks that something I am doing wrong could
>cause the Zope process to become 'unkillable' and
>require a reset of the machine. Has anyone else had
>this problem?

Up to Python 2.3.4 and Python 2.4.0 (fixed in Python 2.3.5 and Python 2.4.1),
a fatal signal (like "SIGSEGV") could bring Zope in a state where
its main thread was killed but the child threads were still alive.
These child threads could only be killed with "kill -9".

Although we now use Python 2.4.1, I have seen a similar problem just
a few days ago. But almost surely, this has to do with the
Java Virtual Machine which we now also integrate in our Zope instances.


However, when even "kill -9" (as "root") is no longer able to kill a process,
then the process is somewhere deep in the operating system (where
signal handling is deactivated for consistency reasons).
Usually, this indicates a network problem.

And if your operating system is no longer ready to shutdown, then
you have an even more fundamental problem -- maybe, too, connected
to network problems.


I fear we cannot help you much -- as a intensive analysis of your
system would be necessary in order to find the causes of your
problems.

>I would have liked to perform some diagnostic on the
>machine in its stuck state, but neither I nor the ISP
>knew where to start.

Usually, one would start with an analysis of the operating system
log files.

If they do not tell anything, then one would check what is still
working (e.g. is the console still responding, does it still
observe the magic "CTRL-ALT-DEL" reboot key sequence), which commands
fail and in what way, ...



-- 
Dieter


More information about the Zope mailing list