[Zope] Folder with one million Documents?

Thomas Guettler Thomas Guettler <thomas@thomas-guettler.de>
Sat, 26 Jan 2002 10:20:39 +0100


On Fri, Jan 25, 2002 at 01:53:15PM -0800, sean.upton@uniontrib.com wrote:
> This will be taxing on Zope, so you need to be willing to be patient enough
> to optimize your application a bit.  BTreeFolder works well for this,
> provided you are willing to consider bypassing use of the ObjectManager APIs
> and read/write to BTreeFolder._tree directly or use BTreeFolder._setOb() and
> BTreeFolder._getOb() instead of ObjectManager._getObject()...

Thank you for this information. I will try it on monday.

> 
> You also will REALLY need some nice hardware.  I would suggest the fastest
> box you can get with LOTS of RAM.  I would look at something along the lines
> of a Dual Athlon 2000+ (P-rated, not MHz) box with 3-4 GB RAM, and a striped
> RAID volume of fast disks.  

OK

> 
> I have a BTreeFolder-derived folder and have populated it with about a
> third-of-a-million Cataloged objects, with each object using an underlying
> relational datastore, and about 8 Cataloged indexes, mostly field indexes
> index the result of a relational query; 

Do you use the relational datastore for performance, or because the
RDBMS was there before you decided to use zope? The development with a
python product is very fast, I would prefere it if it would work
without a RDBMS.

> I don't think BTreeFolder is a problem, but I would suspect that reindexing
> a Catalog with 3 million documents with full-text search setups would take
> you over 10-15 hours on a fast computer, longer if there is a complex amount
> of filtering document formats involved.

The documents are python classes derived from Folder. Only few of
really contain files. Must of them just use PropertyManager.

Is it possible to do the cataloging on a different machine? This would
reduce the load on the primary server.

I don't think ZEO is a solution because there a lot of write access.

Thank you for your answers. I think I will write a small HOWTO if I
got it working.

 thomas


-- 
Thomas Guettler <guettli@thomas-guettler.de>
http://www.thomas-guettler.de