[Zope-dev] catalog performance: query plan

Hedley Roos hedleyroos at gmail.com
Mon Nov 10 11:38:10 EST 2008


>> object_implements    |KeywordIndex     |0.2172234|         4.6
>
> This is clearly not the same issue as the other KeywordIndexes:  in
> fact, I am astonished that anybody would be using a KeywordIndex for
> this at all.  I would suspect that the real problem here is in the
> appliation, rather than the index itself.

The app is Plone :)

>> portal_type          |FieldIndex       |0.0025984|      384.84
>
> This one is surprising:  its performance should be pretty similar to the
>  other FieldIndexes (e.g., 'review_state') which map a controlled
> vocabulary onto the entire corpus.  Was the query different than
> 'review_state' (e.g., multi-valued vs. single-valued)?

portal_type queries are usually multivalued in Plone.

>> sourceUID            |FieldIndex       |0.0004886|     2046.31
>
> Probably bogus, but I don't know how it is used.


Plone's reference_catalog

>
>> UID                  |FieldIndex       |0.0003070|      3257.1
>
> Note that this is the worst-case scenario for a FieldIndex:  there is
> exactly one value for every key.  This shouldn't be "indexed" at all, in
> fact, beyond a simple BTree (UID -> rid).

I've never even thought of that. Perhaps the catalog is used to
present a familiar API.

>
>> targetUID            |FieldIndex       |0.0002287|     4372.12
>
> I don't know what this one is used for, but it should probably be
> scrapped as well.

More reference_catalog.

>
>> exact_getUserId      |FieldIndex       |0.0001931|     5177.79
>> exact_getUserName    |FieldIndex       |0.0001816|     5504.39
>
> I don't know how the application uses either of those indexes, but they
> are almost certainly bogus in any normal catalog.

Membrane and remember. They're currently tied to Plone but efforts are
being made to make them work with CMF.

>> relationship         |FieldIndex       |0.0000822|     12153.1
>> id                   |FieldIndex       |0.0000822|    12161.81
>> end                  |DateIndex        |0.0000623|    16027.48
>> getGroups            |FieldIndex       |0.0000278|    35973.45
>
> This is almost certainly bogus:  FieldIndex is not supposed to be used
> with multi-valued terms.

Plone stuff, but I am intrigued by your statement. Why can FieldIndex
not be used with multi-valued terms?

>> Subject              |KeywordIndex     |0.0000253|    39413.57
>
> This is the use-case for which KeywordIndex is designed.  Was the query
> just a single term, by chance?

The simplest term is a list with only a single term (not counting the
trivial case). It should be worse with more terms right?


>> Title                |ZCTextIndex      |0.0000128|    77809.46
>
> This should be removed:  there is no valid use case for doing a
> full-text search restricted only to the title.

Plone specific.


>> Description          |ZCTextIndex      |0.0000116|    86241.39
>
> Again, should be removed.

Again, Plone specific :)


>> getEmail             |ZCTextIndex      |0.0000113|    87849.05
>
> Should *definitely* be removed:  how can you do full-text search on an
> e-mail address?

I think membrane is responsible for this, but you're right.


>> SearchableText       |TextIndex        |0.0000113|    88466.69
>
> Where did this one come from?  The 'SearchableText' above is a ZCTextIndex.

Membrane!

Kinda pointless for me to continue since this is turning into a
Plone-specific discussion on zope-dev. But at least the whole exercise
has forced us to look in detail into how all these indexes affect
performance with a zodb with many many objects.

Roche investigated Tesdal's queryplan today end it seems to solve
nearly all our performance problems. He'll have to elaborate.

Hedley


More information about the Zope-Dev mailing list