[Zope-dev] Searching/Indexing/ZODB/SQL/BerkleyDB

Chris Withers chrisw@nipltd.com
Wed, 28 Nov 2001 14:37:57 +0000


Casey Duncan wrote:
> 
> > > I would be willing to help both in coding and getting the code put into
> > > the Zope core.
> >
> > <raises hand> me too!

Me three! :-)

Just to put my take on all of this...

As some of you may know, I've been looking at indexign for a while now in one
way or another...

> > I'm interested in this too, and I'm keen to get a solution that will
> > work with just the ZODB, without needing all of Zope.
> 
> Yes, I second, third and forth that motion. I have a bunch of ideas kicking
> around for ZODB-level indexing. Let's talk more. 

I don't believe this is a good idea anymore, especially if you get into any kind
of amount of data.
ZODB simple doesn't seem to scale to indexing very well. You all have no doubt
experienced this with ZCatalog TextIndexes... I have a more flexible and
pluggable indexer written for ZODB (not only Zope! ;-) but it didn't scale to
anything like I needed :-(

FileStorage goes through RAM at a rate of knots. Jim has a patch for this, but I
haven't had a chance o stress test it yet.
bsddb2Storage currently hammers disk meaning it has worse performance when
indexing than FileStorage ;-)

I'm currently working on a MySQL-based full text indexer with phrase matching,
and potentially wildcards some time soon. For me, once this is cracked,
FieldIndexes and the like are trivial in SQL and I intend to encapsulate the
whole thing in a python class for ease of use. This is what I think might be the
best solution; relational databases to tables well, that's what indexing is all
about: tables.

That said, I wasn't aware of Matt's work up until very recently. I'd love to see
an Indexer that didn't require an RDB (or BerkleyDB :-P) and scaled to GigaBytes
of Data...

> Perhaps we should arrange an
> "indexing and catalog" chat on #zope.

...definitely. When shall we set a time and date?

cheers,

Chris