unique values of a Dictionary list (removing duplicate elements of a list)

Peter Otten __peter__ at web.de
Fri May 21 09:46:55 EDT 2010


Chad Kellerman wrote:

> On Fri, May 21, 2010 at 8:07 AM, Chad Kellerman <sunckell at gmail.com>
> wrote:
> 
>>
>>
>> On Fri, May 21, 2010 at 7:50 AM, Peter Otten <__peter__ at web.de> wrote:
>>
>>> Chad Kellerman wrote:
>>>
>>> > Python users,
>>> >       I am parsing an AIX trace file and creating a dictionary
>>> containing
>>> > keys (PIDS) and values (a list of TIDS).  With PIDS being unique
>>> > process ids
>>> > and TIDS, being a list of thread ids.  My function populates the keys
>>> > so that they are unique, but my list contains duplicates.
>>> >
>>> >      Can someone point me in the right direction so that my dictionary
>>> >      value
>>> > does not contain duplicate elements?
>>> >
>>> >
>>> > here is what I got.
>>> >
>>> > --------------<portion of code that is relevant>------------------
>>> >
>>> > pidtids  = {}
>>> >
>>> > # --- function to add pid and tid to a dictionary
>>> > def addpidtids(pid,tid):
>>> >     pidtids.setdefault(pid,[]).append(tid)
>>>
>>> Use a set instead of a list (and maybe a defaultdict):
>>>
>>> from collections import defaultdict
>>>
>>> pidtids = defaultdict(set)
>>>
>>> def addpidtids(pid, tid):
>>>    pidtids[pid].add(tid)
>>>
>>> Peter
>>>
>>
>> Thanks.  I guess I should have posted this in my original question.
>>
>> I'm on 2.4.3  looks like defautldict is new in 2.5.
>>
>> I'll see if I can upgrade.
>>
>> Thanks again.
>>
> 
> 
>  instead of upgrading.. (probably be faster to use techniques in available
> 2.4.3)
> 
> Couldn't I check to see if the pid exists (has_key I believe) and then
> check if the tid is a value, in the the list for that key, prior to
> passing it to the function?
> 
> Or would that be too 'expensive'?

No.

pidtids = {}
def addpidtids(pid, tid):
    if pid in pidtids:
        pidtids[pid].add(tid)
    else:
        pidtids[pid] = set((tid,))

should be faster than

def addpidtids(pid, tid):
    pidtids.setdefault(pid, set()).add(tid)

and both should work in python2.4.

Peter




More information about the Python-list mailing list