[Zope] Re: ZCatalog searching bugs + fixes.

Michel Pelletier michel@digicool.com
Tue, 02 May 2000 18:03:11 -0700


Maas-Maarten Zeeman wrote:
> 
> Hello Michel,
> 
> After some testing with ZCatalog I discovered that only one the AND
> operator with UnTextIndexes works correctly. After some further
> investigation I fixed the probles with the OR and the AND_NOT
> opererator. Here are the fixes.

Cool.
 
> The ANDNOT operator was a silly bug.
> 
> In ResultList.py change:
 
<snip>  This has actually be fixed in the CVS for a couple weeks now I
believe.  Were you working against a CVS version?
 
> The OR bug was a bit more complicated. It only worked if the left and
> right operands where filled with data so if you was searching for
> something that was not in the Lexicon and something that was you would
> get nothing as a search result.
> 
> The problem is that a KeyError is raised when the search term is not in
> the lexicon. So the search is stopped instantly.
> 
> The fix for the OR bug
> 
> An extra get operator for Lexicon
> 
> def get(self, key, default=None):
>     """ overload mapping behavior """
>     return self._lexicon.get(key, default)
> 
> and change the following line in '__getitem__' in UnTextIndex
> # r = self._index.get(self._lexicon[word], None)
> too
> r = self._index.get(self._lexicon.get(word), None)

Ah yes, silly error.  Could you please submit this to the Collector?
 
> The Near operator also doesn't work. I'm not sure how to fix this
> because UnTextIndexes don't safe position information.

This code is legacy code from ZTables, I didn't write it and I also
didn't take it out, it's kinda like your appendix, you don't use it but
it's there.  In the future, indexes may store positional information.
 
> All these bugs leaves me wondering if the catch all used in
> _indexedSearch is necessary. If this was programmed less blunt the
> errors would have been noticed much more early.

You're right.  This method is also a carry over from ZTables.
 
> Another question I have about ZCatalog is the following: Is ZCatalog
> usable for searching and indexing gigabytes text?

That's a hard question to answer.  The short answer is yes.  Over time,
if you index your content incrementally, then you can easily text index
gigabytes of text.

If you were to try to feed gigabytes of text into Zope all in one shot,
it would probably fail either due to:

  1) not enough memory (no subtransations)

  2) not enough tmp storage (with subtransactions)

Since Zope is transactional, all changes are accumlated before commit. 
Because this commit happens 'all at once' for all objects a commit on
gigabytes of objects *plus* indexes for them would take hours, days
maybe.  This is not a good way to do it.

The best way index content is gradually, over time.  We call this
'incremental' indexing.  Using this slow scheme you could easily index
gigabytes of text.

-Michel