[Zope] Zope search engine

Kent Polk kent@goathill.org
Fri, 19 Feb 1999 08:35:11 -0600 (CST)


Regarding search indexing...

The builtin Zope Find manager appears to provide searches
constrained by:
- object type
- ids
- full-text matches (containing)
- mod dates
- folder

I would like to be able to provide application-specific index/search
capabilities. Here's my view:

search by:
- object type. Be able to easily specify a new object type to
   constrain searching.  Seems Zope has that covered.

- ids. This is a search 'by name'.  Covered?

- attribute.  This could correspond to a bibliographic search.
  Paul mentioned Brian's addition that lets one set object 
  attributes via tags in a document.  One might also consider
  providing an 'index_bibliography' or 'index_attribute' method
  which, if it exists, would allow an application manager to 
  provide a way to index document bib's. This would make it 
  easier to index sql document collections or filesystem document
  collections without having to have the info directly stored in
  the Zope database.

- full-text. Existing appears to be for exact matched internal
  documents?  I'd like to see the indexer look for an additional
  'index_body' method which, if it exists, would allow an app
  developer to provide a way to index text documents that possibly
  aren't in the Zope Database.

- query language. Needs one. Not sure how well featured it needs
  to be.

- mod dates. Need a way to register mod-dates as an attribute for
  query results, etc. Intentionally vague here as I don't know what
  the answer should be. However, it seems to be to go completely
  against the grain of Zope to only provide datestamp capabilities
  for simple objects.  The whole idea of Zope (to me) is to provide
  automated site facilities to manage large document collections
  and to be able to build those documents from content objects.

For example, 4 years ago I worked on a project for the NRC which
stored SGML'ed document content in an Oracle database. It was 
almost REALLY cool as you could build a document by performing
an SQL query that collected the results, evaluated the SGML and
produced the report according to your desires (almost). It then
date-stamped the document according to deterministic rules. That
experience greatly shaped how I view information and it is at 
odds with the way that most people think. Fortunately for me, I
think they are wrong. :^)

The QRS reporting system that Ty and I developed with Principia is
similar to the NRC system in that you perform a query, a document
is built, edited, published in html/pdf/postscript on the fly,
referenced, with a complete modification transaction history
available, etc.  Each document has a variety of dates associated
with it and rules are available to set a moddate (expiration date
also, etc).

I'd like to see (again) a method available to set the moddate for
objects that aren't statically in the Zope object database, even
though their parts may be in there, such as Z Tables or even 
TinyTables databased documents.  I would like for the index method
to have access to the moddate. Not sure if it really needs to be
further a part of Zope... Yes, this moddate could just be another
attribute, and maybe it should done be that way, but I just want
to make sure the index engine has a way of accessing it and knowing
that it is the 'official' moddate for that object.

I've also been experimenting with publishing filesystem-stored 
base documents. As another person just mentioned, I have a customer
who insists that the base document information be stored in the
filesystem. I can provide the mod-date, but again, I need a way
to let Zope call indexing methods which I could provide for these
different applications.

Comments? What are you DC guys thinking of for a search engine?

Thanks!
Kent