[Zope-dev] KeywordIndex performance / multiunion

Casey Duncan casey at zope.com
Fri Nov 7 09:26:25 EST 2003


On Fri, 07 Nov 2003 12:02:07 +0000
Seb Bacon <seb at jamkit.com> wrote:

> Casey Duncan wrote:
> > On Thu, 06 Nov 2003 19:11:55 +0000
> > Seb Bacon <seb at jamkit.com> wrote:
> >>A simple query for ["A" or "B" or "C"] against a KeywordIndex containing 
> >>27k objects is taking about 7 seconds on a Celeron 1.6Ghz, which seems 
> >>an absurdly long time to me.
> > 
> > <guess>
> > This time may be caused by fetching from the database. If so, then the
>  > only way to speed it up is increase the ZODB cache or get faster disks.
>  > Try the former and see if it helps. </guess>
> 
> Yup, absolutely right.  Upping the cache speeds it up to something sane. 
>   However, I don't understand why.  The code does something like:
> 
>   set1 = self.index.get(1)
>   set2 = self.index.get(2)
>   sets = [set1, set2]
> 
> ...so the sets will have come from the ZODB.  But the bit which takes 
> the time is the following line:

These are TreeSets most likely. The actual members of the sets are stored in separate persistent objects. This is done so that large sets can be fetched in chunks rather than all at once.

The ZODB tries to be lazy with fetching objects. If an object is very large it often makes sense to split it up between many persistent objects so that each part can be loaded and unloaded separately. This is what BTrees and TreeSets do. When you fetch a value from a BTree or test for an element in a set it only needs to load the part (in this case a Bucket) that contains the element, rather than the whole enchilada which could be huge (in reality its not quite that simple, but that's the general idea).

This is why BTrees and TreeSets are used when you need to store and manipulate arbitarily large numbers of elements.

>   result = multiunion(sets)

This iterates the sets which loads all the buckets from the database if they are not in the cache.
 
> At which point the sets have already been fetched, no?
> 
> looking forward to the day I understand ZODB caches...;-)

Actually this is not so much a function of the cache but a function of the organization of the set objects themselves.

-Casey



More information about the Zope-Dev mailing list