[Zope-dev] ZCatalog and 'fuzzy logic'

Steve Alexander s.alexander@lancaster.ac.uk
Tue, 09 Jan 2001 16:32:47 +0000


Morten W. Petersen wrote:

> Is there anyone who could try to give an estimate of how long it would
> take to add fuzzy logic (regexp-like) searching capability to the
> ZCatalog?
> 
> And reasoning as to why would be appreciated. ;)


Right now, you could use an External Method to apply a regex match to 
each unique value in a field index in a Catalog, and return the 
appropriate Catalog Brains for each match.

This is as easy as called uniqueValues() on the catalog, iterating 
through the unique values to filter them, and then searching the catalog 
with the results of the filter as the constraint for that fieldindex. 
This would minutes and hours to implement and test, and would execute in 
O(number of unique field values) time,  for many values of the 
fieldindex, which should remain acceptably fast where you have a catalog 
with many items, most of which have fields drawn from the same (small) set.

If you want to search a TextIndex using a regex, or you want to search 
for a pattern among a number of fields of the same item, then you're 
into an algorithm that would execute in O(number of cataloged items) 
time. That could get very slow for any sizable catalog.

The other option for searching a TextIndex is to use extensions to the 
NEAR and AND and OR operators that are currently supported. I guess it 
all depends what you mean by "fuzzy matching".

--
Steve Alexander
Software Engineer
Cat-Box limited
http://www.cat-box.net