[Zope-dev] Catalog performance

John Barratt jlb at ball.langarson.com.au
Wed Sep 10 22:41:15 EDT 2003


Max M wrote:

> Nguyen Quan Son wrote:
>  > Hi,
>  > I have a problem with performance and memory consumption when trying 
> to do some statistics, using following code:
>  > ...
>  > docs = container.portal_catalog(meta_type='Document', ...)
>  > for doc in docs:
>  >     obj = doc.getObject()
>  >     value = obj.attr
>  >     ...
>  >
>  > With about 10.000 documents this Python script takes 10 minutes and 
> more than 500MB of memory, after that I had to restart Zope. I
>  > am running Zope 2.6.1 + Plone 1.0 on Windows 2000, Xeon P4 with 1GB RAM.
>  > What's wrong with this code? Any suggestion is appreciated.
>  > Nguyen Quan Son.
> 
> Most likely you are filling the memory of your server so that you are 
> swapping to disk.
> 
> Try cutting the query into smaller pieces so that the memory doesn't get 
> filled up.

If you can't use catalog metadata as Seb suggests (eg. you are actually 
accessing many attributes, large values, etc.) and if indeeed memory is 
the problem (which seems likely) then you can ghostify the objects that 
were ghosts to begin with, and it will save memory (unless all those 
objects are already in cache).

The problem with this strategy though is that doc.getObject() method 
used in your code activates the object and hence you won't know if it 
was a ghost already or not.  To get around this you can shortcut this 
method and do something like :

docs = container.portal_catalog(meta_type='Document', ...)
for doc in docs:
     obj = doc.aq_parent.unrestrictedTraverse(doc.getPath())
     was_ghost = obj._p_changed is None
     value = obj.attr
     if was_ghost:obj._p_deactivate()

You can test this by running your code on a freshly restarted server, 
and check the number of objects in cache.  The number shouldn't change 
much after running the above method, but will increase dramatically if 
you just used 'obj = doc.getObject()' instead, or didn't do the 
deactivating of the objects.  The lower number of objects in your cache 
should in turn keep your memory usage down, and prevent your computer 
paging through the request, and hence speed things up considerably!

Another option would be to reduce the size of your cache so that the 
amount of memory your zope instance consumes doesn't cause your computer 
to swap, though doing the above code changes will also help keep your 
cache with the 'right' objects in it as well, which in turn will further 
help with the performance of subsequent requests.

Cheers,

JB.








More information about the Zope-Dev mailing list