[Zope-dev] Zcatalog bloat problem (berkeleydb is a solution?)

Mon, 25 Jun 2001 14:34:55 +0200

Hello Zopistas,

we are developing a Zope 2.3.3 (py 1.5.2) application that will add, index
and reindex some tens of thousands
objects (Zclass that are DTMLDocument on steroids) on some twenty properties
each day, while
the absolute number of objects cataloged keeps growing (think at content
management for a big
portal, where each day lots of content are added and modified and all the
old content remains as a
searchable archive and as material to recycle in the future).

This seems for some aspects a task similar to what Erik Enge impacted couple
a weeks ago.

We first derived from CatalogAware, then switched to manage ourselves the
cataloging - uncataloging - recataloging.

The ZODB still bloat at a too much fast pace.

***Maybe there's something obvious we missed***, but when you have some
4thousands object in the catalog, if you add and catalog one more object
the ZODB grows circa a couple of megabyte (while the object is some 1 k of
text, and some twelve boolean and datetime and strings properties). If we
pack the ZODB, Data.fs returns to an almost normal size (so the bloat are
made by the transactions as tranalyzer.py confirms).

Any hints on how to manage something like?
We use both textindexes, fieldindexes, and keywordsindexes (textindex on
string properties, fieldindexes on boolean and datetime, keywordindex on
strings). Maybe one kind of indexes is to be avoided?

Erik, any toughts?

We are almost decided to switch to berkeleydb storage (the Minimal one) to
get rid of the bloating, we are testing with it, but it seems to be
discontinued because of a lack of requests.

Any light on it? Is it production grade?

-giovanni