[Zope3-Users] getting random results out of a catalogs field index

Dominique Lederer dominique.lederer at inode.at
Sat May 5 13:27:10 EDT 2007


Christian Theune wrote:
> Am Samstag, den 05.05.2007, 17:42 +0200 schrieb Dominique Lederer:
>> hi
>>
>> i would like to retrieve a number of *random* entries out of a catalogs field index.
>>
>> i tried it with first getting the catalogindex-length an then accessing a
>> randomized list-index, but this is very slow, because of the large number of
>> entries in the index.
>>
>> do you know any better solution?
> 
> I'm kind of guessing here. 
> 
> You say you are:
> 
> - querying the catalog
> - accessing a random index from the result set
> - noticing that this is slow
> 
> Does this only happen if the index is very large, e.g. you're retrieving
> an element from the end of the result set?
> 
> I don't know exactly how the result sets are organized, but this
> behaviour would imply that loading a later element triggers something
> like loading the earlier elements too. I can't really imagine that.
> 
> I think the general problem that this is slow lies in the fact that
> randomly selecting elements means 
> 
> a) you need access to the full list of things
> b) applying a sort 
> 
> Sorting has a complexity of at least O(n log n) which becomes slow
> enough for large sets that it's noticable.
> 
> BTW: How large is large?
> 
> Christian
> 

hi, thanks for the reply, i just managed to improve the performance of my query
significantly:

what i wanted to do was:

- retrieve the len() of the catalog index
- retrieve a list() of the Resultset
- accessing n random results and their objects

to retrieve a random object i did:

query = catalog.apply({'myIndex':(None,None)})
length = len(query)
index_intids = list(query)
intid = all[random.randint(0,len_all-1)]
object = getObject(intid)

which was with 10000 items in the index slow (i had to wait 2-3 seconds for a
view to render)

after looking into the field index implementation i changed the above lines to:

length = len(catalog['myIndex']._rev_index)
index_intids = list(catalog['myIndex']._rev_index.keys())

which now works like a charm.

i am not an expert with BTrees so i cant really say what the problem is/was.

Dominique






More information about the Zope3-users mailing list