[Zope-dev] Searching/Indexing/ZODB/SQL/BerkleyDB

Andreas Jung Andreas Jung" <andreas@zope.com
Wed, 28 Nov 2001 09:52:50 -0500


----- Original Message -----
From: "Chris Withers" <chrisw@nipltd.com>
To: "Casey Duncan" <c.duncan@nlada.org>
Cc: "Steve Alexander" <steve@cat-box.net>; "Wolfram Kerber"
<wk@gallileus.de>; <zope-dev@zope.org>
Sent: Wednesday, November 28, 2001 09:37
Subject: Re: [Zope-dev] Searching/Indexing/ZODB/SQL/BerkleyDB


> Casey Duncan wrote:
> >
> > > > I would be willing to help both in coding and getting the code put
into
> > > > the Zope core.
> > >
> > > <raises hand> me too!
>
> Me three! :-)
>
> Just to put my take on all of this...
>
> As some of you may know, I've been looking at indexign for a while now in
one
> way or another...
>
> > > I'm interested in this too, and I'm keen to get a solution that will
> > > work with just the ZODB, without needing all of Zope.
> >
> > Yes, I second, third and forth that motion. I have a bunch of ideas
kicking
> > around for ZODB-level indexing. Let's talk more.
>
> I don't believe this is a good idea anymore, especially if you get into
any kind
> of amount of data.
> ZODB simple doesn't seem to scale to indexing very well. You all have no
doubt
> experienced this with ZCatalog TextIndexes... I have a more flexible and
> pluggable indexer written for ZODB (not only Zope! ;-) but it didn't scale
to
> anything like I needed :-(
>
> FileStorage goes through RAM at a rate of knots. Jim has a patch for this,
but I
> haven't had a chance o stress test it yet.
> bsddb2Storage currently hammers disk meaning it has worse performance when
> indexing than FileStorage ;-)
>
> I'm currently working on a MySQL-based full text indexer with phrase
matching,
> and potentially wildcards some time soon. For me, once this is cracked,
> FieldIndexes and the like are trivial in SQL and I intend to encapsulate
the
> whole thing in a python class for ease of use. This is what I think might
be the
> best solution; relational databases to tables well, that's what indexing
is all
> about: tables.
>
> That said, I wasn't aware of Matt's work up until very recently. I'd love
to see
> an Indexer that didn't require an RDB (or BerkleyDB :-P) and scaled to
GigaBytes
> of Data...
>
> > Perhaps we should arrange an
> > "indexing and catalog" chat on #zope.

Storage of indexed data is one aspect but there is also need for components
like
lexers, stemmers, splitters etc. Oracle Intermedia as an example has a very
flexible
architecture to handle these components (for all that Oracle Intermedia
sucks).
It would be also interesting to catalog structured documents (e.g. XML) to
be able to
specifies queries that involve structural informations.

Such a project is not trivial and can not be handled by one person but
requires several
volunteers ;-)

Andreas