[Zope-CMF] reindexing optimizations

Chris Withers chris at simplistix.co.uk
Mon Nov 21 09:30:36 EST 2005


Hi Alec,

Alec Mitchell wrote:
> So, Sidnei has been plugging away at the "AT reindexes things an obscene 
> number of times" issue today, and appears to have fixed many of the AT 
> triggered indexing redundancies. 

Where is this work being done? I'd be very interested to track it...

> *) TypesTool.constructInstance() is triggered
>     **) A _setObject call results in CMFCatalogAware.manage_afterAdd() which 
> triggers a full indexObject().

So this one should go away some how :-S

>     *) This is shortly followed by TypesTool._finishConstruction()
>         *) Which calls CMFCatalogAware.notifyWorkflowCreated()
>             *) Which in turn calls WorkFlowTool._reindexWorkflowVariables()
>                 **) Which does a CMFCatalogAware.reindexObject([idxs]) on 
> workflow specific variables (with a full metadata update)

And this one too?

>                 *) And calls CMFCatalogAware.reindexObjectSecurity() which 
> reindexes the object only on the security index, and doesn't touch metadata.

Does reindexObjectSecurity do anything other than just the reindex the 
security indexes? If not, it can go too ;-)

>         **) TypesTool._finishConstruction() then does another 
> CMFCatalogAware.reindexObject().

...leaving just this one :-)

> So we have two full reindexes, and three metadata updates.  The last reindex 
> appears to be there only to catch the change to 'portal_type' in 
> _finishConstruction. 

Well, it's the last one, so I'd argue it should be the _only_ one. Why 
do things need to be indexed before then?

> Additionally, almost immediately before this last reindexObject call, 
> another reindexObject call has happened in notifyWorkflowCreated, which 
> included a full catalog metadata update.  As a result, updating the catalog 
> metadata here is certainly redundant.  Unfortunately, the 
> CMFCatalogAware.reindexObject method provides no means of avoiding the 
> duplicate metadata update, though it would be trivial to add and to use 
> here.

That sounds like a good idea :-)

> Another option suggested by Sidnei on IRC, which would avoid the potential 
> issues with limiting the variables indexed in the final reindex.  Would be 
> to let CMFCatalogAware.manage_afterAdd know (presumably via some state 
> variable) 

Why a state variable rather than just a parameter?

> that it is being invoked through constructInstance/invokeFactory, 
> in which case it could safely skip the initial indexing and allow 
> _finishConstruction to take care of indexing the object fully on it's own at 
> the end.  

+1 from me.

> In the long term we will probably be better served by delaying all 
> indexing to transaction boundaries, though it will be a fair bit harder to 
> implement, and may irk some developers who depend on immediate changes to 
> the catalog on reindex.

Yeah, it also makes things harder to test. Unit tests require stuff to 
be indexed, so if this was the way to go, which apart from that one 
thing I think _should_ be the case, there should be a "flush all pending 
indexing" thing, which should keep everyone happy. Just have to make 
sure that then doesn't get misused and end up being called 100 times per 
operation ;-)

cheers,

Chris

-- 
Simplistix - Content Management, Zope & Python Consulting
            - http://www.simplistix.co.uk


More information about the Zope-CMF mailing list