[Zope] BTreeFolders + Catalog + lots of objects?

sean.upton@uniontrib.com sean.upton@uniontrib.com
Thu, 08 Nov 2001 11:03:37 -0800


The proxy objects exist in the odb to provide the ability to catalog each
record.

Right now, I've worked with sets of up to 20k objects indexed in a Catalog
using several field and globbing text indexes without issue.

As for cataloging them (and hoping for the best), I will likely use
sub-transactions in the method in my container that mass-rebuilds (and thus
reindexes) the BTreeFolder's contained objects, which will use some very
small fields in the relational database obtained via methods that grab them
from the database.  Each record in the database total is about 400 bytes, so
fairly small, and I don't plan on storing much metadata in the Catalog, but
I imagine that the search index will be at least 100MB...  The fields are
small, mostly customer information (name, address, that sort of thing), and
the main reason I want to catalog them is that I want customer service reps
to be albe to do a globbing full-text search on a portion of a full-name
field that isn't always consistent (i.e. "John Doe" vs. "Doe, John" or all
kinds of other variants).

Sean

-----Original Message-----
From: Chris Withers [mailto:chrisw@nipltd.com]
Sent: Thursday, November 08, 2001 12:13 AM
To: sean.upton@uniontrib.com
Cc: zope@zope.org
Subject: Re: [Zope] BTreeFolders + Catalog + lots of objects?


sean.upton@uniontrib.com wrote:
> 
> Has anyone used the BTreeFolder product to store hundreds-of-thousands or
> millions of objects?

Nope.

> I'm developing an internal CRM system that will contain somewhere between
> 300k-500k records, stored in a back-end relational datastore, and exposed
> via metadata proxy objects (1-per-record) sitting in a container
subclassed
> from BTreeFolder; 

Is there any particular reason the proxy objects need to live in the ZODB?

> I am wondering if anybody has done anything similar to this, in terms of
> number of objects stored in a BTreeFolder, and the type of storage that
they
> used.  I'm also wondering about anyone using ZCatalog for such a large
> number of indexed objects.

I'm having infinite ammounts of fun (not!) attempting to index 40,000 word
documents. It largely depends on what types of indexes you will be using on
these objects and what size the objects themselves are.

If the objects are anything more than extremely simple and small proxies, I
reckon you'll run into problems with BTreeFolder. Likewise, if you're
indexing
is anything other than simple Field indexing (and with that number of
objects,
even that may be enough) you won't successfully index that many objects.

It's go for sticking the lot in your relational backend.

cheers,

Chris (who's had his faith in ZODB scalability systematically destroyed over
the
last couple of months :-( )
> 
> I'm also thinking of overriding BTreeFolder.manage_main_listing() with a
> user interface that allows users to simply type in an object id into a
text
> box, instead of listing them (the string object ids correspond to a long
> integer from 1..n); it wouldn't work very well to list half-a-million
> objects in a html-form select control...  Ideally, I'd do batching of some
> sort, but I'm having trouble figuring out how I would do that.
> 
> Anyone have any thoughts?
> 
> Sean
> 
> =========================
> Sean Upton
> Senior Programmer/Analyst
> SignOnSanDiego.com
> The San Diego Union-Tribune
> 619.718.5241
> sean.upton@uniontrib.com
> =========================
> 
> _______________________________________________
> Zope maillist  -  Zope@zope.org
> http://lists.zope.org/mailman/listinfo/zope
> **   No cross posts or HTML encoding!  **
> (Related lists -
>  http://lists.zope.org/mailman/listinfo/zope-announce
>  http://lists.zope.org/mailman/listinfo/zope-dev )