[Zope] Experimental searchable mail list archive

The Dragon De Monsyne dragondm@integral.org
Tue, 5 Oct 1999 05:06:53 -0500 (CDT)


On Tue, 28 Sep 1999, Michel Pelletier wrote:

> Greetings,
> 
> I finally got sick of paging through endless archive messages, so I
> implimented an expirimental searchable list archive:
> 
> http://www.zope.org:12080/archives/Catalog/S
> 
> will present you with a single text search box.  This is a very trivial
> interface, it will be expanded upon.
> 
> Please try and use this over the next few days and see if it help answer
> your questions.
> 
> I used the fsimport script to import the entirety of the pipermail
> archive, and then cataloged it with the 'Find objects' Catalog tab.  In
> the process, I fixed a silly design flaw that improved the mass indexing
> speed of catalog by at least 200% and greatly reduced the memory
> overhead and thrashing.  The dataset of documents is 56MB, the total
> dataset plus indexes is 64MB.  Not bad.  It took 6 minutes to index the
> entire dataset with a 10000 word subtransaction threshold and the
> process footprint grew to 85MB.  Catalog has come a long way in terms of
> speed and memory usage.
> 
> Further improvements are to parse the documents into rfc822 Messages
> (probably with a ZClass), index all interesting attributes (date,
> author, etc), and impliment a simple ZPublisher.Client script that
> mailman calls to 'push' a message up to the server, instanciate a new
> message object, and incrimentaly index it in the Catalog.
> 
	Hmmm! Whaddayaknow! this is exactly what I've been working on! 
I've been planning out a product called MessageBase to do this.  I'm
sketching out the Message class right now. I'm planning on it having full
MIME suport. (one of the things I have gotten done so far is an imporved
version of python's mimetools module thast is actually compliant to the
MIME RFC's) 

	-The Dragon De Monsyne