[Zope-dev] 100k+ objects, or...Improving Performance of BTreeFolder...

Phillip J. Eby pje@telecommunity.com
Mon, 10 Dec 2001 09:54:38 -0500


I'm not sure if this is taken into consideration in your work so far/future 
plans...  but just in case you were unaware, it is not necessary for you to 
persistently store objects in the ZODB that you intend to index in a 
ZCatalog.  All that is required is that the object to be cataloged is 
accessible via a URL path.  ZSQL methods can be set up to be 
URL-traversable, and to wrap a class around the returned row.  To load the 
items into the catalog, you can use a PythonScript or similar to loop over 
a multi-row query, passing the objects directly to the catalog along with a 
path that matches the one they'll be retrievable from.  This approach would 
eliminate the need for BTreeFolder altogether, although of course it 
requires access to the RDBMS for retrievals.  This should reduce the number 
of writes and allow for bigger subtransactions in a given quantity of memory.


At 07:36 PM 12/9/01 -0800, sean.upton@uniontrib.com wrote:
>Interesting FYI for those looking to support lots of cataloged objects in
>ZODB and Zope (Chris W., et al)... I'm working on a project to put ~350k
>Cataloged objects (customer database) in a single BTreeFolder-derived
>container; these objects are 'proxy' objects which each expose a single
>record in a relational dataset, and allow about 8 fields to be indexed (2 of
>which, TextIndexes).
>
>...
>
>- Also, I want to make it clear that if I had a data access API that needed
>more than simple information about my datasets (i.e. I was trying to do
>reporting on patterns, like CRM-ish types of applications), I would likely
>wrap a function around indexes done in the RDB, not in Catalog.  My requires
>no reporting functionality, and thus really needs no indexes, other than for
>finding a record for customer service purposes and account validation
>purposes.  The reason, however, that I chose ZCatalog was for full text
>indexing that I could control/hack/customize easily.  My slightly uninformed
>belief now is that for big datasets or "enterprise" applications (whatever
>that means), I would use a hybrid set of (faster) indexes using the RDB's
>indexes where appropriate (heavily queried fields), and ZCatalog for
>TextIndexes (convenient).   I'm sure inevitable improvements to ZCatalog
>(there seems to be community interest in such) will help here.