[Gsoc] ZODB-related project ideas

Wed Mar 19 17:55:31 EDT 2008

Hello,

I'm the "victim" that Sidnei was talking about here [1] :-).

I want to help ZODB through a GSoC project. Sidnei already listed my  
ideas on the wiki [2] (and most of them are from the ZODB blueprints  
[3]), but I would like to know which is the best.

My experience: I'm using ZODB for about 2 years (with Zope and Plone),  
but I don't have much experience of it's internal code. I'm working on  
my graduation thesis this semester; it's subject is "Replication of OO  
Databases", and I'm using ZODB (and probably will use ZEORaid) to  
demonstrate it. Because of this I'm getting myself more and more  
familiar with ZODB code.

So, the list:

= Improved replication for ZODB through ZEO Raid =

The ZEO RAID storage is a proxy storage that works like a RAID  
controller by creating a redundant array of ZEO servers. The  
redundancy is similar to RAID level 1 except that each ZEO server  
keeps a complete copy of the database. This proposal aims to improve  
gocept.zeoraid, fixing bugs, testing and implementing needed features.

= RelStorage support to Microsoft SQLServer =

RelStorage is a storage implementation for ZODB that stores pickles in  
a relational database. This proposal aims to provide support for  
Microsoft SQLServer in order to take advantage of it's replication and  
deployment infrastructure.

= Allow limiting the size of ClientStorage's blob cache =

Currently the ClientStorage will grow the blob cache indefinitely. We  
should allow some control about how large the cache may grow, e.g. by  
making the ClientStorage support minimizing the blob cache. (Idea: LRU  
principle via mstat access time and a size-based threshold) currently).

 From https://blueprints.launchpad.net/zodb/+spec/limit-clienstorage-blob-cache 
.

= An alternative object caching strategy for ZODB: object  
classification =

Every ZODB connection maintains a cache that keeps objects in memory  
to avoid unnecessary IO work. All objects that are loaded within a  
transaction are put into the cache and are evicted from the cache at  
certain points (e.g. transaction boundaries) when cache minimization  
is performed.

The currently implemented strategy for evicting objects is least  
recently used (LRU). The size of the object cache is limited by the  
number of objects. This size is configurable per database. Large  
applications have different usage patterns on objects so that the LRU  
strategy does not always fit those patterns effectively. Examples are:

  * Applications with a variety of smaller and larger objects. The  
classical example of files stored as pdata constructs defeats the LRU  
caching strategy and potentially evicts heavily used (or even all)  
objects from the cache just to allow loading many smaller,  
infrequently used objects.
  * Catalogs: Light-weight objects like brains might benefit from a  
higher priority of staying cached in comparison to other objects. This  
proposal aims to modify the ZODB to allow different cache  
implementations to be used, and implement a cache that performs an  
object classification using a configurable strategy.

 From https://blueprints.launchpad.net/zodb/+spec/classifying-object-cache 
.

1 - http://article.gmane.org/gmane.comp.web.zope.zodb/9191
2 - http://wiki.zope.org/gsoc/SummerOfCode2008
3 - https://blueprints.launchpad.net/zodb/

Regards,
--
Dirceu Pereira Tiegs
Weimar Consultoria

Hospedagem Plone, Zope e Python
http://www.pytown.com