[Zope] ZCatalog searching questions

Dave Kimmel David.Kimmel@just.gov.ab.ca
Thu, 30 Sep 1999 09:18:21 -0600


> Some people thing, 'why not use re (the Python regex 
> module)?', because
> searching like '*ing' would require iterating over all the keys, a
> linear search like this could take multiple order of maginitude more
> time than a non-regex search.

I understand this having worked with various relational and not quite so
relational databases in the past - however, even on a 355,000 record file
UniVerse can perform a search using its strange regular expression like
system in ~20 seconds (this being without the benefit of indexes, when
searching on an indexed field the same search is done in about 2 seconds).
I suspect that even if it has to iterate through every key, there won't be
any performance problems.

Also, this brings up another point...  Can the basic search interface on the
Folder object itself do regex searches on the content?  Can it be hacked to
do so?  Speed isn't a concern at the moment since the dataset will be quite
small, and the machine its on has plenty of power to spare.

> There is a pretty good compromise solution called n-grams, 
> but they also
> result in a lexicon increase, and a much more complicated 
> algorithm.  I
> can refer you to a good book that describes them.

For how little they're paying me, I think I'll pass on this one  ;-)

> Yes.  The 'lexicon' has a hardwired 'synonym and stopword' 
> dictionary in
> lib/python/SearchIndex/Lexicon.py.  This is also projected to be
> improved by allowing through-the-web lexicon managment (like 
> specifying
> stopwords and synonmys).  Someone also suggested interfacing 
> it to some
> kind of synonym database, you'd have to search through the arvhives to
> find the reference.

Cool!  I was thinking of doing this myself, but if I just have to edit a
single file it shouldn't be all that bad!
  
> > Am I asking too much of this?  Should I be buying a Python 
> book and adding
> > this functionality myself?  Should I be using something 
> other than ZCatalog?
> > Should I be using something other than Zope?  (Please say 
> no, I happen to
> > like Zope!)
> 
> Go for it, but don't give up on ZCatalog or Zope, I'd be surprised if
> you found fully featured regex searching in another package that would
> take less of a headache to use than just implimenting a simple
> 'reversed' lexicon that let's you do globbing (like dos 
> wildcard, no *s
> in the middle of words, etc.).

I'm thinking of using a different search product since I don't know python
at all, and I'm not sure if I want to start out by adding features to such a
large and well thought out product.  Are there any other search products for
Zope available?

As disgusting as the thought is, perhaps I should be looking into using a
relational database for this (just for the improved searching - I'd rather
use a pure Zope solution)?  Does anyone have any suggestions for storing
hierarchical data in a relational database?

And while on the subject of databases...  Does Zope/Python have any way of
interfacing with a MultiValue database such as UniVerse (prefered since we
already use UV here), jBase, UniData, D3, etc?

Thanks!
-- Dave Kimmel
Systems Analyst
Office of the Public Trustee, Alberta Justice