[Zope-dev] How do index the contents of a File?

Michel Pelletier michel@digicool.com
Fri, 7 Jan 2000 14:58:19 -0500


> -----Original Message-----
> From: Evan Simpson [mailto:evan@tokenexchange.com]
> Sent: Friday, January 07, 2000 1:50 PM
> To: bauer@atlas.unisinos.br; zope-dev@zope.org
> Subject: Re: [Zope-dev] How do index the contents of a File?
> 
> 
> From: Carlos Henrique Bauer <bauer@atlas.unisinos.br>
> > I created a ZCatalog to index all the contents of a site. 
> To index the
> contents
> > of the documents I added a 'raw' text index to the catalog, 
> but it seems
> to be
> > working just with DTML Documents. Do I need to add another 
> index to the
> catalog
> > to make it index File documents or is it a ZCatalog bug?
> 
> File objects are not 'documents', although they may of course 
> contain files
> which you consider to be documents.  Since File objects can contain
> *anything*, including binary executables or images, they don't define
> 'contents' for ZCatalog purposes.

Well stated Evan.  File's do have an attribute called 'data' that
ZCatalog will use like any other attribute (if you have an index called
'data'), however, as you pointed out, you may end up indexing quite a
bit of garbage.
 
> What you might want to do is create a sub-ZClass of File dedicated to
> storing your document files, and define a 
> PrincipiaSearchSource method which
> returns the contents of the object.  You could even 
> pre-process the contents
> in this method if you wanted to strip off headers or 
> formatting or any other
> bits you don't really want in your full-text search.

This is probably the best solution.

-Michel