[Zope] Major problems with slow catalog querie

Lea Smith richandleacv@yahoo.com
Thu, 19 Sep 2002 13:33:05 +0100 (BST)


The catalog search times mentioned below are the time
the query is with the catalog called from a python
script like...

context.do_nothing_script()
results = context.portal_catalog( search_dict )
context.do_nothing_script()

The 'do_nothing_script' does just that, but it shows
me exactly the time the catalog takes using the
CallProfiler Product.
The 'search_dict' is a dictionary of the search
queries for the catalog indexes as I found this
improved the catalog speed a little, rather than
passing the whole REQUESt dict.
(This is being run on a P111 900Mhz with win2000, Zope
2.5.1, Python 2.1.3)

>If you can isolate a case where searching the catalog
alone (with no 
>other
>operations being performed) takes minutes I would
like a chance to 
>analyse
>it closer.

I am digging around my code at the moment trying to
find problems I have created myself :-). I have
discovered a few errors and this has helped with the
speed.

I have have two search forms:

1.
Basic search which searches on:
'SearchableText' (textindex), 
'Subject' (keywordindex), 
'Type' (keywordindex), 
'sort_on', and 'sort_order'

I have managed to improve the catalog search to around
0.5 to 1 second. The 'sort_on' parameter seems to chew
up most of this time. Is there any way to improve the
'sort_on' speed?

2.
Advanced search which can search on one or more of the
following; 
'SearchableText' (textindex), 
'Subject' (keywordindex), 
'Type' (keywordindex), 
'type' (keywordindex), 
'modified' (fieldindex {passing 'query':'blah' and
'range':'min' in a dict} ),
'item_size' (fieldindex), 
'value' (fieldindex {passing 'query':['q_min','q_max']
 and 'range':'minmax' in dict} ),
'Creator' (fieldindex), 
'state' (fieldindex), 
'country' (fieldindex), 
'sort_on', and 'sort_order'.

I have noticed that the 'modified' parameter causes a
huge hit on the catalogs performance. Is this normal?
Are there things I can do to improve it?

Passing the following to the portal_catalog causes the
catalog to take around 35-45 seconds on average (These
queries return 400 results from the 10,500 items
cataloged)...

{'modified': {'range': 'min', 'query':
DateTime('1995/01/01')}, 'Type': ['Classified'],
'sort_on': 'modified', 'review_state': 'published',
'sort_order': 'reverse'} 

Taking out the 'modified' key and passing the
following reduces the catalog search time to approx
0.5 to 1 second on average...

{'Type': ['Classified'], 'sort_on': 'modified',
'review_state': 'published', 'sort_order': 'reverse'} 

The catalog returns results within 0.5 to 1 second
with any combination of the indexes searched on as
listed for the advanced search above. But as soon as
the modified query is put into the search_dict it all
goes out the window. 

I had experimented with creating an index which
cataloged the modified date as a floating point
number. Searching on the floating point number
(modified date) made no difference to the search
speed. As I wrote this I thought whether cataloging
the date as an interger would be any different.
Searching the modified date as an interger has reduced
the search time from average of 35-45 seconds down to
1.5 - 5 seconds. Still not as fast as I would like,
but a whole lot better. Does this suggest I have a
problem with zope or the cmf? I have run out of ideas
to improve it further.

>Some questions for you first:
>How many objects are in your catalog, 11,000?

At this stage there is around 10,500 objects. These
are stored in 8 'Portal Folder' folders in a generic
member account. One of these folders holds approx 5600
of these items, could this be causing a problem for
the catalog? Future content will be created by
individual members in the cmf. The generic account
won't have more content added to it. Content will more
than likely be slowly deleted or moved from this
generic account to another member account for someone
to take over.
The objects are instances of Products which inherit
from either Skinned_Folder or Link from the
CMFDefault. And all objects inherit
DefaultDublinCoreImpl from CMFDefault. 

>How many indexes are you searching simultaneously?
>What kinds of indexes are these?
>Do the searches involve globbing? (* and ? wildcards
for text 
>searches).

The Vocabulary in use is the default 'ZopeSplitter'
with globbing enabled.
I haven't got as far as testing search speeds with
wildcards as yet.

>Globbing searches can be achingly slow using
TextIndex/Vocabulary (The
>vocabulary is at fault). ZCTextIndex seems to perform
much better in 
>this
>regard.

Any help you can give or where I can find
examples/information on performance of similar
zope/cmf sites as a comparison of what I should be
able to squeeze out of zope would be a great help. 

Thanks for your time, it is very much appreciated.

Richard


__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com