[Zope3-Users] Re: getting random results out of a catalogs field index

Jürgen Kartnaller juergen at kartnaller.at
Sat May 5 15:52:02 EDT 2007



Dominique Lederer wrote:
> Christian Theune wrote:
>> Am Samstag, den 05.05.2007, 17:42 +0200 schrieb Dominique Lederer:
>>> hi
>>>
>>> i would like to retrieve a number of *random* entries out of a catalogs field index.
>>>
>>> i tried it with first getting the catalogindex-length an then accessing a
>>> randomized list-index, but this is very slow, because of the large number of
>>> entries in the index.
>>>
>>> do you know any better solution?
>> I'm kind of guessing here. 
>>
>> You say you are:
>>
>> - querying the catalog
>> - accessing a random index from the result set
>> - noticing that this is slow
>>
>> Does this only happen if the index is very large, e.g. you're retrieving
>> an element from the end of the result set?
>>
>> I don't know exactly how the result sets are organized, but this
>> behaviour would imply that loading a later element triggers something
>> like loading the earlier elements too. I can't really imagine that.
>>
>> I think the general problem that this is slow lies in the fact that
>> randomly selecting elements means 
>>
>> a) you need access to the full list of things
>> b) applying a sort 
>>
>> Sorting has a complexity of at least O(n log n) which becomes slow
>> enough for large sets that it's noticable.
>>
>> BTW: How large is large?
>>
>> Christian
>>
> 
> hi, thanks for the reply, i just managed to improve the performance of my query
> significantly:
> 
> what i wanted to do was:
> 
> - retrieve the len() of the catalog index
> - retrieve a list() of the Resultset
> - accessing n random results and their objects
> 
> to retrieve a random object i did:
> 
> query = catalog.apply({'myIndex':(None,None)})
> length = len(query)
> index_intids = list(query)
> intid = all[random.randint(0,len_all-1)]
> object = getObject(intid)
> 
> which was with 10000 items in the index slow (i had to wait 2-3 seconds for a
> view to render)
> 
> after looking into the field index implementation i changed the above lines to:
> 
> length = len(catalog['myIndex']._rev_index)

If you are using FieldIndex use

length = catalog['myIndex'].documentCount()

The FieldIndex holds a counter with the number of entries in the _rev_index.

> index_intids = list(catalog['myIndex']._rev_index.keys())
> 
> which now works like a charm.
> 
> i am not an expert with BTrees so i cant really say what the problem is/was.

len on a btree is slow because it needs to iterate over all keys to 
count them!


If possible always avoid using the catalog and use the index directly, 
it is much faster!

Jürgen



More information about the Zope3-users mailing list