[Zope] Advice on searching/indexing Word documents?

sean.upton@uniontrib.com sean.upton@uniontrib.com
Tue, 02 Jan 2001 16:50:58 -0800


Cool.  I'll have to take a look at this.  Does anyone know if there is any
effort aimed at writing document filters for use with Zope?  A lot of
commercial products used for knowledge management (like NextPage
LivePublish, some Intranet search engines, etc) already have features like
this, and I would think that a project for document filters would be a good
idea, if something like this doesn't already exist.

Possible things that could be filtered for input:
- The IPTC header data from a JPG/TIF image - comtains a few things like the
caption (the same one that you can edit in photoshop) - This would be a good
addition to various Image classes.
- Office documents (word, excel, powerpoint, wordperfect, staroffice, etc)
- PDF and Postscript documents
- Illustration files (Illustrator, CorelDraw)

The value of such filters to Zope for use in knowledge-management and
digital asset management would be great; I'm wondering if anyone is working
on anything like this?

Sean

-----Original Message-----
From: Jonothan Farr [mailto:jfarr@real.com]
Sent: Tuesday, January 02, 2001 4:22 PM
To: sean.upton@uniontrib.com; zope@zope.org
Subject: Re: [Zope] Advice on searching/indexing Word documents?


>I used to
> write text filters in C and Lex for my previous employer - one of these
days
> I will figure out how to extend python with C and do this.  

Here's one that's written entirely in Python:
http://www.cosc.canterbury.ac.nz/~greg/python/Plex/

I've seen a couple of other implementations out there.

--jfarr