[Zope-dev] SVN: Zope/branches/2.13/ fix `LazyMap` to avoid unnecessary function calls when not accessing items in order (fixes http://dev.plone.org/plone/ticket/9018)

Andreas Zeidler az at zitc.de
Wed Dec 15 03:54:58 EST 2010


hi tres,

On 14.12.2010, at 18:11, Tres Seaver wrote:
> On 12/14/2010 09:12 AM, Andreas Zeidler wrote:
>> Modified: Zope/branches/2.13/src/Products/ZCatalog/Lazy.py
>> ===================================================================
>> --- Zope/branches/2.13/src/Products/ZCatalog/Lazy.py	2010-12-14 13:24:24 UTC (rev 118862)
>> +++ Zope/branches/2.13/src/Products/ZCatalog/Lazy.py	2010-12-14 14:12:11 UTC (rev 118863)
>> @@ -145,44 +145,28 @@
>>    # Don't access data until necessary
>> 
>>    def __init__(self, func, seq, length=None):
>> -        self._seq = seq
>> -        self._data = []
>> -        self._func = func
>> +        self._seq=seq
>> +        self._func=func
>>        if length is not None:
>> -            self._len = length
>> +            self._len=length
> 
> Please don't un-PEP8 a cleaned-up module:  leave spaces around
> operators.  If that was unintentional (maybe you pasted from a copy of a
> file made before the module was cleaned up?), then you still need to
> review the diff and edit out unintended changes first.

yes, in was indeed unintentional, or rather unfortunate.  i've originally worked on the 2.12 branch (in a plone 4 buildout) where i think hanno's pep8-related changes haven't been applied/back-ported yet.  after that i pasted the version into 2.13 (and yes, the commit order was the other way round :)) and was in fact wondering about the longer diff when hanno (who was actually sitting next to me) pointed out he had changed the one-line if: and else: expressions on that branch.  however, my alias for "svn diff" tells diff to not show white space changes so i missed the now again removed spaces around the =.

anyway, it's fixed in c118895.

>>        else:
>>            self._len = len(seq)
>> +        self._marker = object()
>> +        self._data = [self._marker] * self._len
> 
> Have you measured the impact of pre-allocating here for very large
> sequences?

i had not, but have now.  turns out the allocation makes a difference earlier than i would have expected:  accessing the first 20 items of the map is slower with the new version starting at sequences of a little more than 2000 items.  and of course it (relatively) gets slower the longer the sequence is.  setting up a 10^5 "empty" sequence takes about 0.5 ms on my machine.

> I'm *sure* that is a not a good trade-off for all cases:  the catalog is
> often used for queries which return result sets in the order of 10^5 or
> more objects, and *very* rarely is it accessed randomly (I had never
> seen it before reading the bug entry you cite).  The normal case is to
> take the first few entries (batch_size * batch_number) from the huge set.

true, but the fix wasn't only about this use-case.  it's mainly about avoiding extra function calls when fetching a batch other than the first.  so far the map function wall called for every item up to the one accessed, e.g. for the 5th batch of 20 items each, you'd end up with 80 unnecessary (and possible expensive) calls.  or iow, the old version wasn't really as lazy as it should have been… :)

> I think you would be better off finding a way to inject your style of
> LazyMap into the catalog for the random-access case, e.g., by passing
> '_lazy_map' into the methods you are using.

i disagree.  i think it should be fixed properly upstream.  we should really avoid all that monkey-patching all over the place.  but see below…

> BTW, another interesting alternative impplementation might be to use a
> dictionary instead of a list for 'data'.  You could then avoid any
> pre-allocation at all.

with the benchmark in place, i was curious and tried that as well:  it beats the other versions independently of the sequence length and the number of accessed items, both sequentially and repeatedly over a single batch.  in the benchmark they're accessed from the start, but skipping a few batches would make it even faster anyway.  hence i've updated the code again in c118896.  please let me know what you think, or if you want the test script i've been using.

> While we're at it, the 'try: ... except:'  here is wasteful and slow.
> Lazy objects aren't persistent, and the '_seq' attribute is added
> unconditionally in '__init__'.

you're right, this should be cleaned up as well.  but since this isn't "my code" i was trying to keep the diff minimal (like on 2.12, i.e. in c118864, it obviously didn't work in the other one :)).  in any case, it's also gone with c118896.

cheers,


andi
		
-- 
pgp key at http://zitc.de/pgp - http://wwwkeys.de.pgp.net/
plone 4.0 released! -- http://plone.org/4

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 195 bytes
Desc: This is a digitally signed message part
Url : http://mail.zope.org/pipermail/zope-dev/attachments/20101215/b5b87090/attachment.bin 


More information about the Zope-Dev mailing list