ZODB/ZSS High Availablity, was: RE: [Zope] Zope Myths?

sean.upton@uniontrib.com sean.upton@uniontrib.com
Thu, 12 Sep 2002 14:21:28 -0700


I have been doing a lot of thinking about odb/storage/zss replication
lately, but I haven't had a chance to implement these practices yet, so your
mileage, insights, and opinions may vary from these thoughts...

If the thing that makes replication hard is constant change of lots of
interdependant data, a meaningful snapshot system as close to the database
software as possible (i.e. DirectoryStorage's snapshots, not LVM's) likely
mitigates that risk by providing reasonable assurance of atomicity.  If the
replication process itself has problems part way through transfer (a low
tech soutions like find+cpio over nfs would), it is up to the sysadmin to
write scripts to:
	1 - Keep multiple areas for replication
		-> Stage the entire replication in a temp 
		   dir before putting it in the place that
		   it is used by ZSS software
			-> since there is no way to do a 
			   transactional file copy of multiple
			   files, how about using symlinks, and
			   moving the symlink on completion	of a
			   full, atomic transfer and completed
			   storage consistency check?
	2 - Have clustering software resource takeover scripts
	    (i.e. heartbeat resource scripts) evaluate:
		a. if the storage it is about to use is good, &
		b. if the last transfer failed, use the last
		   _good_ full replicated set of files.
		c. The above two checks must be done before starting
		   the ZSS process on the backup server node.

Mostly, I can't see how shared storage (DAS/SAN) can provide the same
risk-avoidance levels that could be done with the above practices, unless
you have some ways of mirroring the last good copy of your odb storage
within the same shared storage (replication between two places on the same
storage; I assume snapshots and scrips on the secondary node to check
consistency of storage/db like 2(a) above could come in handy for this too)?

Sean

-----Original Message-----
From: tomas@fabula.de [mailto:tomas@fabula.de]
Sent: Thursday, September 12, 2002 1:55 PM
To: Bill Anderson
Cc: sean.upton@uniontrib.com; pw_lists@slinkp.com; zope@zope.org
Subject: Re: [Zope] Zope Myths?


On Thu, Sep 12, 2002 at 11:12:27AM -0600, Bill Anderson wrote:
> On Thu, 2002-09-12 at 00:21, tomas@fabula.de wrote:
> > On Wed, Sep 11, 2002 at 03:46:43PM -0700, sean.upton@uniontrib.com
wrote:

[Big hardware vs. replication]

> Well, in that case, your network is a single point of failure, too. :^)

Assuming just one network, assuming just one connectivity provider.
Problem is that replication solutions depend heavyly on the type of
application (slowly changing sets of files being the easiest and
rapidly changing data sets with complex interdependencies (e.g.
high-volume databases) the hardest.

> Expensive, well that depend son what you are doing. For under 35000 you
> can have just shy of a 1TB of file space, with snapshot capability,
> multi-machine fail over, and a whole lot more. That cost includes two
> machines running Linux with fail over. It depends on your needs, and
> your uptime/availability requirements.

...or you can have ten cheapo off the shelf servers hosted at wildly
different places... (OK, it's more like 0.5TB then ;)

> For example, if you are running a site like cbsnewyork.com, 25-35 grand
> is not that much. If you are running a small site, then you don't need
> it. My point was that it (it being ZEO/ZODB/ZOPE) _can_ scale to that.
> I've done it.
> 
[...]
> 
> Well, speaking as a former tester of SAN technology, it would appear
> things have changed dramatically since your experiences. :)
> 

Yes, but this was mainly my point: if you have access to knowledge
and experience with those things -- then you may go for it. If
you don't... it's just a point against it.

> The configuration/setup is essentially the same as with SCSI, in fact,
> Fiber channel uses the SCSI subsystem in the OS. The underlying system
> is as robust as the SCSI system, since it is SCSI just over a different
> medium.
> 
> 
[RAID system doing funny things]
> 
> I've never seen this with the Fiber Channel Arrays I dealt with. But
> then again, they had two or more controllers. :^)

Of course not -- but I take that you *know* what you are doing. This
vendor didn't (at some point I realized that), but heck, it wasn't my
job, I had enough on the plate myself.

[...]

> Same thing with fibre channel SAN tech, the range is measured in miles.
> I know of several SANs that are spread over multiple states. You can
> literally have a fail over datacenter.

Yes. It's a tradeoff. I just wanted to point out that experience with
those things is one of the points to consider (besides application type,
requirements and cost).

It'd be interesting to know (I'm not a Zope guy) how well Zope as an
application would play in each camp.

Thanks
-- tomas