[Zope-dev] Re: Zope Mailing Lists and ZCatalog

R. David Murray bitz@bitdance.com
Fri, 4 Aug 2000 21:43:25 -0400 (EDT)


On Fri, 4 Aug 2000, Michel Pelletier wrote:
> Andy Dawkins wrote:
> > The problem we have is getting that many objects in to the Catalog.  If we
> > load the objects in to the ZODB, then catalog them, the machine either runs
> > out of memory or, if we lower the sub transactions, It runs out of hard
> > drive space.

Don't lower the subtransactions too much; because of the way BTree works
you wind up generating a *lot* more disk writes than you would think.

I can catalog 61K records (a small amount of data for each record,
though) on a machine with 256MB of memory.  More memory is the
easiest solution...

> > If we use CatalogAware to catalog the objects as they are imported the
> > Catalog explodes to stupid sizes because CatalogAware doesn't support Sub
> > transactions.
> 
> Subtransactions are a storage thing, and really don't have anything to
> do with catalogaware, if you have a subtransaction threshold set then
> subtransactions will be used for any cataloging operation, catalogaware
> or not.

I've imported my whole 61K object folder tree, and the resulting
Data.fs file was about twice the size of the zexp file.  That
hardly sounds like "exploded", so maybe there's something odd
in the way you are doing the import?  You definately don't want
to be committing transactions or subtransactions too often.

> > Also as messages arrived over time the Catalog would once again explode
> > dramatically,

This is definately an issue for something like archiving a mailing list.
It sounds like, in the current state of things, you really want to
move to a non-transaction storage for the catalog.

> There isn't anything wrong with the Catalog (for this particular
> problem), or at least, there isn't anything in the catalog to fix that
> would solve your problem.  We've had customers index well over 50,000
> objects; you just have to understand the resource constraints and work
> with them, for example, don't mass index, use storages that scale to
> high write environments, etc.

There has, however, been at least one posting from DC about the
technology that underlies the catalog, the BTree.  Apparently
there *is* some tuning that can be done to make the BTree generate
fewer object updates when modifications take place (something about
parent objects getting updated unnecessarly, my hazy memory says).
Is any active work being done on BTree?

--RDM