[Zope] Experimental searchable mail list archive

Michel Pelletier michel@digicool.com
Tue, 28 Sep 1999 00:28:57 -0400


Greetings,

I finally got sick of paging through endless archive messages, so I
implimented an expirimental searchable list archive:

http://www.zope.org:12080/archives/Catalog/S

will present you with a single text search box.  This is a very trivial
interface, it will be expanded upon.

Please try and use this over the next few days and see if it help answer
your questions.

I used the fsimport script to import the entirety of the pipermail
archive, and then cataloged it with the 'Find objects' Catalog tab.  In
the process, I fixed a silly design flaw that improved the mass indexing
speed of catalog by at least 200% and greatly reduced the memory
overhead and thrashing.  The dataset of documents is 56MB, the total
dataset plus indexes is 64MB.  Not bad.  It took 6 minutes to index the
entire dataset with a 10000 word subtransaction threshold and the
process footprint grew to 85MB.  Catalog has come a long way in terms of
speed and memory usage.

Further improvements are to parse the documents into rfc822 Messages
(probably with a ZClass), index all interesting attributes (date,
author, etc), and impliment a simple ZPublisher.Client script that
mailman calls to 'push' a message up to the server, instanciate a new
message object, and incrimentaly index it in the Catalog.

Stay tuned.

-Michel