need optimizing help

Robert Brewer fumanchu at amor.org
Sat Mar 13 15:33:51 EST 2004


rabbits77 wrote:
> I have a dictionary with a very very large(possibly millions) of
> key/value pairs.
> The key is a tuple that looks like (id,date)
> What is the fastest way to get out all of the values that match any
> key given that they individual key elements are coming from two
> seperate lists?
> The approach of
> for id in IDS:
>     for date in dates:
>         data=myDict[(id,date)]
> 
> seems to just take too long. Is there a speedier, more pythonic, way
> of doing this? Any help speeding this up would be much appreciated!!

If you're willing to handle some minor side-effects, one common approach
is an index layer via a nested dict; that is, instead of:

myDict[(id, date)] = value

...you execute:

myIndex.setdefault(id, {})[date] = value

You would then grab values via:

for id in IDS:
    bucket = myIndex[id]
    for date in dates:
        data = bucket[date]

The proof is in the pudding:


----indextest.py----

def flatdict():
    flat = {}
    for id in range(1000):
        for date in range(1000):
            flat[(id, date)] = 'notlob'
    return flat

def grab_flat(myDict):
    for id in range(475, 525):
        for date in range(990, 1000):
            data = myDict[(id, date)]

def indexdict():
    index = {}
    for id in range(1000):
        for date in range(1000):
            index.setdefault(id, {})[date] = 'notlob'
    return index

def grab_index(myDict):
    for id in range(475, 525):
        for date in range(990, 1000):
            data = myDict[id][date]


PythonWin 2.3.2 (#49, Oct  2 2003, 20:02:00) [MSC v.1200 32 bit (Intel)]
on win32.
Portions Copyright 1994-2001 Mark Hammond (mhammond at skippinet.com.au) -
see 'Help/About PythonWin' for further copyright information.
>>> import timeit
>>> t1 = timeit.Timer("fd = i.flatdict()", "import indextest as i")
>>> t2 = timeit.Timer("idx = i.indexdict()", "import indextest as i")
>>> t3 = timeit.Timer("i.grab_flat(fd)", "import indextest as i; fd =
i.flatdict()")
>>> t4 = timeit.Timer("i.grab_index(idx)", "import indextest as i; idx =
i.indexdict()")
>>> t1.timeit(1)
15.798860190331453
>>> t2.timeit(1)
1.3386880176111617
>>> t3.timeit(1000)
0.5080377534016236
>>> t4.timeit(1000)
0.36944779294577756


Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org




More information about the Python-list mailing list