[Zope-CMF] Re: reindexing optimizations

Florent Guillaume fg at nuxeo.com
Sat Nov 19 14:12:41 EST 2005


Alec Mitchell wrote:
> Howdy CMFers,
> 
> So, Sidnei has been plugging away at the "AT reindexes things an obscene 
> number of times" issue today, and appears to have fixed many of the AT 
> triggered indexing redundancies.  There are however still a few places in 
> CMF where some cataloging redundancy might be avoided.  One obvious place is 
> during object creation, where the following happens:
> 
> *) TypesTool.constructInstance() is triggered
>     **) A _setObject call results in CMFCatalogAware.manage_afterAdd() which 
> triggers a full indexObject().
>     *) This is shortly followed by TypesTool._finishConstruction()
>         *) Which calls CMFCatalogAware.notifyWorkflowCreated()
>             *) Which in turn calls WorkFlowTool._reindexWorkflowVariables()
>                 **) Which does a CMFCatalogAware.reindexObject([idxs]) on 
> workflow specific variables (with a full metadata update)
>                 *) And calls CMFCatalogAware.reindexObjectSecurity() which 
> reindexes the object only on the security index, and doesn't touch metadata.
>         **) TypesTool._finishConstruction() then does another 
> CMFCatalogAware.reindexObject().
> 
> So we have two full reindexes, and three metadata updates.  The last reindex 
> appears to be there only to catch the change to 'portal_type' in 
> _finishConstruction.  So, this final reindexObject, might safely be changed 
> to reindexObject(['portal_type', 'Type']),

This was the case in my initial code, but Yuppie changed it:
http://svn.zope.org/trunk/CMFCore/TypesTool.py?rev=35903&r1=35864&r2=35903
I don't remember what the reason was, though I believe it was discussed 
a bit at the time on the lists.

> though the possibility exists 
> that other indexed attributes added by 3rd parties may depend on the value 
> of portal_type (say, I use an autogenerated Title which includes the Type).  
> Additionally, almost immediately before this last reindexObject call, 
> another reindexObject call has happened in notifyWorkflowCreated, which 
> included a full catalog metadata update.  As a result, updating the catalog 
> metadata here is certainly redundant.  Unfortunately, the 
> CMFCatalogAware.reindexObject method provides no means of avoiding the 
> duplicate metadata update, though it would be trivial to add and to use 
> here.

But as you realize, there is a problem when you have metadata computed 
using methods. As exemplified by portal_type / Type, just because one 
attribute is modified doesn't mean only one metadata (or index for that 
matter) is changed.

> Another option suggested by Sidnei on IRC, which would avoid the potential 
> issues with limiting the variables indexed in the final reindex.  Would be 
> to let CMFCatalogAware.manage_afterAdd know (presumably via some state 
> variable) that it is being invoked through constructInstance/invokeFactory, 
> in which case it could safely skip the initial indexing and allow 
> _finishConstruction to take care of indexing the object fully on it's own at 
> the end. 

That's certainly a good hack. There are several ways to do it, either 
with a thread-local variable, or in the request, or by walking the 
stack's locals to check for a __dont_index__ attribute... You'd have to 
bench, but a thread-local variable is probably the fastest. You want to 
store a set of objects whose indexing should be skipped.

> In the long term we will probably be better served by delaying all 
> indexing to transaction boundaries, though it will be a fair bit harder to 
> implement, and may irk some developers who depend on immediate changes to 
> the catalog on reindex.

As Julien said it's not very hard to implement, it's just that there are 
application changes to consider. Still, there's agreement that CMF 
should move in that direction, I can provide patches taken from the CPS 
implementation. (And it requires Zope 2.8/ZODB 3.4 of course.) Some of 
the framework should pushed into Zope itself.

Florent

-- 
Florent Guillaume, Nuxeo (Paris, France)   Director of R&D
+33 1 40 33 71 59   http://nuxeo.com   fg at nuxeo.com


More information about the Zope-CMF mailing list