[Zope] Static fail-over

sean.upton@uniontrib.com sean.upton@uniontrib.com
Tue, 22 Jan 2002 10:11:18 -0800


The half-dead Zope problem could seemingly be solved with something like
Mon, where a box is responsible to monitor its own resources; you could,
upon failure of any service using a custom monitor script, trigger an alert
script that stops Zope or even stops the network interfaces.  Provided that
not all of your boxes have the same problem, this seems pretty safe.

Sean

-----Original Message-----
From: Toby Dickenson [mailto:tdickenson@devmail.geminidataloggers.co.uk]
Sent: Tuesday, January 22, 2002 3:16 AM
To: Chris McDonough
Cc: Anthony Baxter; Terry Hancock; zope@zope.org
Subject: Re: [Zope] Static fail-over


On Mon, 21 Jan 2002 19:48:41 -0500, "Chris McDonough"
<chrism@zope.com> wrote:

>Linux Virtual Server works well also as a load balancer.  Additionally,
>Toby Dickenson's recent patch for Zope

hey, that me.

> that allows it to peer with Squid as
>an ICP server (http://www.zope.org/Members/htrd/icp) is a really
interesting
>thing; it allows for easy failover at the app level

Failover seems to be working well.

>as well as rudimentary
>balancing at the IP level.

load *balancing* was an unexpected bonus, and I am suprised how well
it works. I actually developed the system because I wanted the
opposite; load *clustering*.

I have some large objects that are expensive to transfer from my ZEO
server. It is better to send all the requests relating to that object
to Zopes that already have the object cached, even if that leads to an
uneven load distribution.

>  In comparison, LVS requires that you poll each
>Zope behind the balancer every so often if you want to detect a failure and
>react, and unlike any of the fancy $30k load balancers it doesn't look at
>HTTP response codes to decide if a server returning a "503" error for
>instance can be taken out of rotation.  Zope as an ICP peer has this
>built-in because it takes the first nonerror response it gets from any
Squid
>peer.

Thats the goal, but my ICP patches arent quite there yet. It is
possible for a half-dead Zope to still respond to ICP.

I think the ideal solution would use both methods:

ICP involves checking before *every* request, therefore the check has
to be quick and it cant be very thorough. However, it can respond
quickly to catastrophic failures.

Separately, perform more thorough checks every so often, where you can
tune the frequency of the checks against the cost and thoroughness of
each one. I cant think of anything better than the LVS-style poll,
since the data path excercises the whole system. Squid has a similar
system, but not yet merged into the trunk: 

http://squid.sourceforge.net/rproxy/backend.html#healtcheck

Toby Dickenson
tdickenson@geminidataloggers.com

_______________________________________________
Zope maillist  -  Zope@zope.org
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists -
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )