[Zope-dev] cataloging binary files (pdf's, word docs...)

Roman Milner roman@speeder.com
15 Feb 2000 23:06:43 -0600


I'm trying to come up with a way to catalog PDF's and Word docs.  It
is  easy to write python methods to pull the text ouf of these.  The
problem is that we already have tons of them in our ZODB as file
objects.

The only thing I can think of is to make a zclass class for each type
(ie. PDFFile type) that has a method that knows how to extract the text
from the pdf and have zcatalog catalog that property.  But this means
re-creating all the binary files currenlty in our ZODB.

Can any one offer any better suggestions?  I could write a python
method that extracted the text based on mime type but I can't go back
and ad that method to each file object.

Thanks for any help.

^Roman