[Zope] Re:Re: eliminating dupes in a list (Tres Seaver)

Wed, 05 Apr 2000 19:04:27 -0400

sathya <linuxcraft@redspice.com> asked:

>sathya <linuxcraft@redspice.com> asked:
>
>> I have a list to pass in as a parameter to dtml-in but before doing that I
>> would like to eliminate duplicates  form the list.
>> ie in ['1','2','1'] I want to skip the duplicate 1. is there a zope hack for
>> this or do I have to use an external method
>
>This requires some Python expression trickery which can't (currently) be done
>within DTML (filter and map aren't available to DTML).  You are probably better
>off using a PythonMethod for such logic.  For grins, I used the Python
>interpreter to bang out the following Python expression:
>
>  filter( None, map( lambda i, d={}:
>                        ( i, None )[ d.has_key(i) or d.update( {i: 1} ) or 0 ]
>                   , foo ) )
>
>This is too convoluted to use in production code (and it strips out 0 values,
>too) -- much better a nice, straightforward, "Pythonic" solution, a Python
>method 'uniq' taking a single argument, 'items':
>
> d = {}
> for item in items:
>     if not d.has_key( item ):
>        d.update( { item: 1 } )
> return d.keys()
>
>Call from DTML:
>
>  <dtml-in "uniq( myItems )" sort>
>    ...
>  </dtml-in>
>-- 
>=========================================================
>Tres Seaver  tseaver@digicool.com   tseaver@palladion.com
>
>
You're uniq method is not as fast as it could be. The call to has_key is superfluous and the update call has to creates a dictionary which then gets thrown away.

All you need to do is:
def uniq2(items):
    d = {}
    for item in items:
        d[item]=1
    return d.keys()

This saves creating a dictionary, and having to hash the key twice for every item. It runs about 2-3 times faster

test
# list of 10000 random integers
results show unique keys and time
there is a small advantage to not being the first run for string keys

bash-2.02$ python uniq.py
Integer keys 
uniq1 6214 0.0955042775547
uniq2 6214 0.0308158706009

String keys
14650 words in file
uniq1 3521 0.0922106302846
uniq2 3521 0.036688347093
Second time around is a bit faster
uniq1 3521 0.088270029681
uniq2 3521 0.036336634824

test code
specify a text file to read into list of words 
===========

import time
import whrandom
import string

def uniq1(items):
    d = {}
    for item in items:
        if not d.has_key( item ):
            d.update( { item: 1 } )
    return d.keys()

def uniq2(items):
    d = {}
    for item in items:
        d[item]=1
    return d.keys()

generator = whrandom.whrandom()

# get some text file this is a zope mailing list file of about 90K
listofwords = string.split(open('c:/temp/message.txt').read())

listofitems = []
for item in range(10000):
    listofitems.append(generator.randint(0,9500))

print 'Integer keys'
starttime = time.clock()
starttime = time.clock()

l = uniq1(listofitems)
stoptime = time.clock()
print 'uniq1', len(l), stoptime-starttime

l = uniq2(listofitems)
stoptime = time.clock()
print 'uniq2', len(l), stoptime-starttime

print
print 'String keys'
print len(listofwords), 'words in file'

starttime = time.clock()
l = uniq1(listofwords)
stoptime = time.clock()
print 'uniq1', len(l), stoptime-starttime

starttime = time.clock()
l = uniq2(listofwords)
stoptime = time.clock()
print 'uniq2', len(l), stoptime-starttime

print 'Second time around is a bit faster'
starttime = time.clock()
l = uniq1(listofwords)
stoptime = time.clock()
print 'uniq1', len(l), stoptime-starttime

starttime = time.clock()
l = uniq2(listofwords)
stoptime = time.clock()
print 'uniq2', len(l), stoptime-starttime