splitting one dictionary into two

Robert Brewer fumanchu at amor.org
Thu Apr 1 12:41:03 EST 2004


[jsaul]
> > I have to split a dict into two dicts. Depending on their values,
> > the items shall remain in the original dict or be moved to another
> > one and at the same time be removed from the original dict.
> > 
> >     dict1 = { "a":1, "b":3, "c":5, "d":4, "e":2 }
> >     dict2 = {}
> >     klist = []
> > 
> >     for key in dict1:
> >         if dict1[key] > 3: # some criterion
> >             dict2[key] = dict1[key]
> >             klist.append(key)
> > 
> >     for key in klist:
> >         del dict1[key]
> > 
> >     print dict1
> >     print dict2
> > 
> > That means that I store the keys of the items to be removed from
> > the original dict in a list (klist) and subsequently remove the
> > items using these keys.
> > 
> > Is there an "even more pythonic" way?

---- cleave.py ----

def cleave(mapping, truthfunc):
    """Solution by Robert Brewer."""
    dict1, dict2 = {}, {}
    while mapping:
        key, value = mapping.popitem()
        if truthfunc(key, value):
            dict2[key] = value
        else:
            dict1[key] = value
    return dict1, dict2

def extractbykey(mapping, truthfunc):
    """Solution by Peter Otten."""
    dict2 = {}
    for key in mapping:
        if truthfunc(key, mapping[key]):
            dict2[key] = mapping[key]
    
    for key in dict2:
        del mapping[key]
    
    return mapping, dict2

def inplace(mapping, truthfunc):
    """Solution by wes weston."""
    dict2 = {}
    for key, value in mapping.items():
         if truthfunc(key, value): # some criterion
             dict2[key] = value
             del mapping[key]

--- end cleave.py ---

>>> t1 = timeit.Timer("cleave.cleave(m, lambda k, v: v % 2)", "import
cleave\nm = {}\nfor i in xrange(100):\n    m[i] = i")
>>> t2 = timeit.Timer("cleave.extractbykey(m, lambda k, v: v % 2)",
"import cleave\nm = {}\nfor i in xrange(100):\n    m[i] = i")
>>> t3 = timeit.Timer("cleave.inplace(m, lambda k, v: v % 2)", "import
cleave\nm = {}\nfor i in xrange(100):\n    m[i] = i")

>>> t1.timeit(100000)
0.28098654996654204
>>> t2.timeit(100000)
5.6455228248282765
>>> t3.timeit(100000)
6.3886250906189161

...this also holds for larger dicts:

>>> t1 = timeit.Timer("cleave.cleave(m, lambda k, v: v % 2)", "import
cleave\nm = {}\nfor i in xrange(1000000):\n    m[i] = i")
>>> t2 = timeit.Timer("cleave.extractbykey(m, lambda k, v: v % 2)",
"import cleave\nm = {}\nfor i in xrange(1000000):\n    m[i] = i")
>>> t3 = timeit.Timer("cleave.inplace(m, lambda k, v: v % 2)", "import
cleave\nm = {}\nfor i in xrange(1000000):\n    m[i] = i")

>>> t1.timeit(10)
2.1568995500824828
>>> t2.timeit(10)
7.0625512460382538
>>> t3.timeit(10)
34.820299227974488


[wes weston]
>     I'll have a hack with small code. Nice that you can
> iterate while removing.

...yes, but you quickly pay a price by using .items(), which IIRC
creates an intermediate dict to hold the k/v pairs.

I'm not a master on the internal workings of dict, but I *believe*
cleave() will scale better, since it neither introduces an intermediary
structure, nor is a key/value pair ever present in more than one dict at
the same time. Someone correct me if I'm wrong.


Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org




More information about the Python-list mailing list