[Zope-dev] zdaemon fix

Guido van Rossum guido@python.org
Sat, 05 Oct 2002 13:32:32 -0400


> > I am thinking of a fairly simple change to zdaemon: if it finds that
> > it is continuously respawning the program more than 10 times in 2
> > minutes, it assumes there is a fatal error, log a PANIC level message,
> > and exit.  (
> 
> > (I took the criterion, but not the response from init(8).)
> 
> In this scenario init pauses for a few minutes, rather than
> aborting. I would like an option to prevent zdaemon aborting, and I
> am surpised you dont want it as the default.
> 
> I think init uses a simple fixed pause... an exponential backoff would 
> probably be smarter (like how a disconnected ZEO ClientStorage tries to 
> reconnect to its server)

I thought about this, and figured it wasn't necessary.  Unlike init,
zdaemon only manages one process.  When that process doesn't get past
its initialization, manual intervention is normally required to make
it run again; that manual intervention can include restarting it.

But I have to admit that the use case I've been thinking of is that of
starting zeo and finding that it crashes immediately, over and over.
There the auto-stop is just what you need (there's no point in filling
up the log file while you're thinking about what could have caused
this).

There's a different use case where something changes in the
environment after the program has run successfully for a while, which
causes it to crash and causes subsequent restarts to crash
immediately.  It is *possible* that the environment fixes itself after
a while -- it could be something like a network, DNS or NFS outage --
and then an auto-restart option might be nice.

I'm not sure what should be the default -- as a developer, I prefer
that it stops (and I hate that zdaemon is the default at all), but for
a production site something different might be in order.

--Guido van Rossum (home page: http://www.python.org/~guido/)