[Zope] large installations and conflict errors

Mon Aug 8 20:03:36 EDT 2005

On Aug 8, 2005, at 10:01 AM, M. Krainer wrote:
> So far our story, but what I really wonder is, if there's anyone out
> there who has a similar large installation. Please let me know how
> large your zope instance is and what you have done to increase your
> (write) performance. Also any ideas that may help us are welcome.

We have a ZODB that packs down to about 30 gigs. The unpacked size  
grows about 10 gigs a week, which shows that there is a lot of write  
activity in our environment too. We have three Zope instances as ZEO  
clients (1.4Gig PIII, with about two gigs of RAM) A load balancer in  
front of those machines is set to favor certain URL prefixes towards  
the same machine. This someone unbalanced set up for the load  
balancer improves the chance that the ZEO client cache will have the  
appropriate object and avoid accessing the ZODB for it. (on these  
machines, the ZEO client cache is set to 2GB and a cache flip occurs  
maybe twice a week.)

Three other machines are handling purely non-interactive tasks  
(either through wget or through the Scheduler product) If possible,  
these machines are set up with a single zope thread and a large  
memory cache. (instead of the standard setup with four threads of xMB  
each, it is one thread of x*4MB.)  Not only does this help with the  
speed of a request,  but prevents each threads private object cache  
from having duplicate copies of the same object. (these machines also  
have a 2GB ZEO client cache, but flip daily)

A ZCatalog has a index that is single large Zope object, loosing it  
from cache will cause a lot of pain when you need it again. Although  
we don't use QueueCatalog, I can see the advantage of having it  
concentrate a lot of catalog work in a  single thread and transaction.

Zope's opportunistic transactions are assuming that a request will  
complete relatively quickly, and that the likelihood of two entirely  
separate requests accessing the same object is slim. I like to think  
of it as the assumption that it is hard for two lightning bolts to  
hit the same place at the same time. The two ways you can run afoul  
to this assumption is to either have one object whose modification is  
greatly favored over others, or requests taking much longer than  
average.

I've had to investigate object hotspots before, and what I've found  
useful is fsdump.py on an unpacked version of the database.

fsdump.py var/storage/var/Data.fs|sed -n  's/.*data #[0-9]*//p'|sort| 
uniq -c|sort -n

then finding particular oids that occur in the fsdump log much more  
frequently than the rest. Once you've found the hot objects, you can   
looj back through the fsdump.py log to find the transactions that  
they belong to and the URL associated with them. Once you've found  
the code paths that are all modifying the same object, then the  
changes that need to be done to make the object less hot are  
application specific.

For requests that are taking so long that they are starting to  
interfere with other requests, they might be able to be found with  
requestprofiler.py and the ZopeProfiler product. Once they are found,  
standard code optimization techniques are needed to reduce them.

That's about all I can think of writing at the moment, but if you  
have anything you want to ask me, give me a yell.