list_to_dict()

Peter Abel p-abel at t-online.de
Mon Jan 20 14:14:19 EST 2003


Martin Maney <maney at pobox.com> wrote in message news:<b0ga40$nr8$1 at wheel2.two14.net>...
> Peter Abel <p-abel at t-online.de> wrote:
> >>>> def myzip(keys,values):
> > ....    res={}
> > ....    map(lambda x,y:res.setdefault(x,y),keys,values)
> > ....    return res
> > 
> >   And I think it goes with python 2.1. or earlier.
> >   A little time_test(number_of_keys) - function which generates
> >   simple 5-random-letters-keys and a value-list as range(number_of_keys)
> >   will show an increase in speed.
> >>>> time_test(50000)
> > dzip  :  0.28100001812
> > myzip :  0.140999913216
> >>>> time_test(500000)
> > dzip  :  3.31200003624
> > myzip :  1.59399998188
> 
> That's odd.  When I dropped the map(...) code into my simple test, it
> was the worst performer at all sizes.  :-(
> 
> Oh, wait - are you testing building single dictionaries with 50K keys? 
> My largest test is with 1000 keys per dictionary, and the results that
> actually matter would with perhaps 5 to 10 keys.  Try building a dict
> of 10 items 5K times instead.  Not that the trend I see at 1000
> keys/dict and below makes the numbers you reported seem likely even
> so...
> 
> Building dicts of 5 items 40000 times (CPU time per Kdict):
>         traditional: 0.0190
> traditional inlined: 0.0162
>           dict(zip): 0.0255
>   inlined dict(zip): 0.0223
>                 map: 0.0337
>         inlined map: 0.0302
> Building dicts of 10 items 20000 times (CPU time per Kdict):
>         traditional: 0.0285
> traditional inlined: 0.0260
>           dict(zip): 0.0320
>   inlined dict(zip): 0.0295
>                 map: 0.0535
>         inlined map: 0.0500
> Building dicts of 100 items 4000 times (CPU time per Kdict):
>         traditional: 0.2000
> traditional inlined: 0.2025
>           dict(zip): 0.1650
>   inlined dict(zip): 0.1600
>                 map: 0.4125
>         inlined map: 0.4100
> Building dicts of 1000 items 400 times (CPU time per Kdict):
>         traditional: 2.0500
> traditional inlined: 2.0500
>           dict(zip): 1.5750
>   inlined dict(zip): 1.5500
>                 map: 3.9000
>         inlined map: 4.0750
> 
> The changes in the total numbers of keys inserted was done to keep the
> overall running times about the same - between two and three seconds
> on the P2/233 I originally did this on.  As you can see, I've since
> changed the format to report the time per 1000 dictionaries to make the
> shape of the scaling apparent.
> 
> The "inlined" versions simply place the body of the functions into the
> body of the test loop, of course.  I can't now recall why I called the
> dzip() code "traditional", unless it was just that that's what I have
> been using in a couple of projects.

I give up !!
Until today I thougt I could say that I know a little bit
about Python programming. But from now I'm not quite sure
if I know how to write  P Y T H O N ? ?

I tried the following code:
import random,time,gc

# An inline-function to generate random keys with n letters
# between chr(65) and chr(90) [A-Z]
genkey=lambda n:reduce(lambda
last_chars,next_char:last_chars+next_char,map(lambda
n:chr(random.randint(65,90)),range(n)),'')

def test(n_keys=1000,key_items=100,item_len=5):
  print '-'*60
  print "Key-Generation for %d keys with %d items/key and each item %d
chars long"%(n_keys,key_items,item_len)
  # It's not very readable, but I hope its quicker then a for-loop
  keys=[ tuple( map(lambda n:genkey(item_len),range(key_items) )) for
k in range(n_keys)]
  # Values are simpler
  values=range(n_keys)
 
  # Here we go

  # Your dzip-function
  res = {}
  st1=time.time()
  for k,v in zip(keys, values):
    res[k] = v
  et1=time.time()
  dzip_time=et1-st1
  print 'dzip :','Time total:',dzip_time,'Time/key 
:',dzip_time/n_keys

  # Some garbage-collection
  # to have similar conditions
  res = {}
  gc.collect()
  
  # myzip-function
  st2=time.time()
  map(lambda x,y:res.setdefault(x,y),keys,values)
  et2=time.time()
  mytime=et2-st2
  print 'myzip:','Time total:',mytime,'Time/key  :',mytime/n_keys
  if mytime:
    print 'Timerelation dzip/myzip',dzip_time/mytime
  else:
    print "Can't calculate Timerelation dzip/myzip ->
ZeroDivisionError"
test()
test(500,50,10)
test(2000,200,2)
print

And I let it run 3 times under w2k in the dos-box and believe
or not I got 3 different results as you can see:

M:\Python\XML-XSL\Tests>list_to_dict.py
------------------------------------------------------------
Key-Generation for 1000 keys with 100 items/key and each item 5 chars
long
dzip : Time total: 0.0199999809265 Time/key  : 1.99999809265e-005
myzip: Time total: 0.00999999046326 Time/key  : 9.99999046326e-006
Timerelation dzip/myzip 2.0
------------------------------------------------------------
Key-Generation for 500 keys with 50 items/key and each item 10 chars
long
dzip : Time total: 0.00999999046326 Time/key  : 1.99999809265e-005
myzip: Time total: 0.00999999046326 Time/key  : 1.99999809265e-005
Timerelation dzip/myzip 1.0
------------------------------------------------------------
Key-Generation for 2000 keys with 200 items/key and each item 2 chars
long
dzip : Time total: 0.0299999713898 Time/key  : 1.49999856949e-005
myzip: Time total: 0.0299999713898 Time/key  : 1.49999856949e-005
Timerelation dzip/myzip 1.0


M:\Python\XML-XSL\Tests>list_to_dict.py
------------------------------------------------------------
Key-Generation for 1000 keys with 100 items/key and each item 5 chars
long
dzip : Time total: 0.0199999809265 Time/key  : 1.99999809265e-005
myzip: Time total: 0.00999999046326 Time/key  : 9.99999046326e-006
Timerelation dzip/myzip 2.0
------------------------------------------------------------
Key-Generation for 500 keys with 50 items/key and each item 10 chars
long
dzip : Time total: 0.00999999046326 Time/key  : 1.99999809265e-005
myzip: Time total: 0.00999999046326 Time/key  : 1.99999809265e-005
Timerelation dzip/myzip 1.0
------------------------------------------------------------
Key-Generation for 2000 keys with 200 items/key and each item 2 chars
long
dzip : Time total: 0.039999961853 Time/key  : 1.99999809265e-005
myzip: Time total: 0.0199999809265 Time/key  : 9.99999046326e-006
Timerelation dzip/myzip 2.0


M:\Python\XML-XSL\Tests>list_to_dict.py
------------------------------------------------------------
Key-Generation for 1000 keys with 100 items/key and each item 5 chars
long
dzip : Time total: 0.0199999809265 Time/key  : 1.99999809265e-005
myzip: Time total: 0.0 Time/key  : 0.0
Can't calculate Timerelation dzip/myzip -> ZeroDivisionError
------------------------------------------------------------
Key-Generation for 500 keys with 50 items/key and each item 10 chars
long
dzip : Time total: 0.00999999046326 Time/key  : 1.99999809265e-005
myzip: Time total: 0.00999999046326 Time/key  : 1.99999809265e-005
Timerelation dzip/myzip 1.0
------------------------------------------------------------
Key-Generation for 2000 keys with 200 items/key and each item 2 chars
long
dzip : Time total: 0.039999961853 Time/key  : 1.99999809265e-005
myzip: Time total: 0.0199999809265 Time/key  : 9.99999046326e-006
Timerelation dzip/myzip 2.0

And the winner is . . .   (forget it!!)

Cheers 
Peter




More information about the Python-list mailing list