[Zope] Intersection/Union of ZCatalog result sets

Jonathan Hobbs toolkit at magma.ca
Fri Sep 24 09:10:25 EDT 2004


From: "Johan Carlsson" <johanc at easypublisher.com>
>
> >I am doing this to try to squeeze out some performance improvements from
a
> >ZCTextIndex. We have a zcatalog with about 1 million documents that we
are
> >full-text indexing and it no longer fits into memory (therefore requiring
> >many disk i/o's during retrieval which is seriously degrading
performance).
> >
> >Our zcatalog currently has 5 indexes: 4 minor indexes and one major index
> >(the main ZCTextIndex).  I am attempting to split the zcatalog into two
> >separate zcatalogs: one containing the 4 minor indexes and one containing
> >the ZCTextIndex.  The hope is that the zcatalog containing only the
> >ZCTextIndex will be smaller and will again fit into memory.
> >
> >
> Why would it be smaller?
> You still need to load the indexes when you do a search, right?
> Or do you intend to index different objects in different catalogs?
> In that case couldn't you use the idxs attribute
> of ZCatalog::catalog_object(self, obj, uid=None, idxs=None,
> update_metadata=1)?

Moving only the ZCTextIndex (and its Lexicon) into a separate ZCatalog
should result in a smaller ZCatalog, as the space required by the other 4
indexes (3 Field Indexes and another ZCTextIndex) will be located in a
different folder - I am going to load the ZCatalog containing the main
ZCTextIndex into a Temporary Folder (to hold it in memory).

Both ZCatalogs will index the same documents (stored in a separate
BTreeFolder2).


> >The only difficulty is in combining the results from searches of two
> >separate zcatalogs in an efficient manner.  My best guess at this point
is
> >that I will have to patch the 'search' routine in ZCTextIndex to stop it
> >from 'Lazifying' the result sets, so that I can join/intersect the result
> >sets based on OIDs (instead of RIDs - which should be doable as the
result
> >sets prior to 'lazifying' are xxBTrees and the BTrees product comes with
> >methods for join/intersection). I can then 'Lazify' the final result set
and
> >return it.  At least that's the theory!
> >
> >
> Maybe do a version of ZCatalog (or rather Catalog) that uses OIDs as RIDs?
> Only problem is that OIDs are int64 and BTrees.IISet et al. uses int32.
> So you would need a IISet that take long.

Thanks for the 'heads-up'.  I had hoped to use OIDs instead of RIDs, but
hadn't considered the 64/32 bit problem. I'll have to see if I can find a
64bit BTrees package, and failing that, try to modify the existing package
to use long ints - this just keeps getting better and better :)

Thanks for the help!

Jonathan





More information about the Zope mailing list