[Tutor] Find duplicates (using dictionaries)

Kent Johnson kent37 at tds.net
Wed Feb 17 19:54:41 CET 2010


On Wed, Feb 17, 2010 at 11:31 AM, Karjer Jdfjdf <karper12345 at yahoo.com>wrote:

> I'm relatively new at Python and I'm trying to write a function that fills
> a dictionary acording the following rules and (example) data:
>
> Rules:
> * No duplicate values in field1
> * No duplicates values in field2 and field3 simultaneous (highest value in
> field4 has to be preserved)
>
>
> Rec.no field1, field2, field3, field4
> 1. abc, def123, ghi123, 120 <-- new, insert in dictionary
> 2. abc, def123, ghi123, 120 <-- duplicate with 1. field4 same value. Do not
> insert in dictionary
> 3. bcd, def123, jkl125, 154 <-- new, insert in dictionary
> 4. efg, def123, jkl125, 175 <-- duplicate with 3 in field 2 and 3, but
> higher value in field4. Remove 3. from dict and replace with 4.
> 5. hij, ghi345, jkl125, 175 <-- duplicate field3, but not in field4. New,
> insert in dict.
>
>
> The resulting dictionary should be:
>
> hij     {'F2': ' ghi345', 'F3': ' jkl125', 'F4': 175}
> abc     {'F2': ' def123', 'F3': ' ghi123', 'F4': 120}
> efg     {'F2': ' def123', 'F3': ' jkl125', 'F4': 175}


I'm not sure I understand the rules. Something like

if (f2, f3) in list:
  if associated f4 < new f4:
    delete old (f2, f3) entry
    add new entry (f1, f2, f3, f4)
else if f1 not in list:
  add new entry (f1, f2, f3, f4)

Is that right? If so, ISTM you need a table with two indexes, one by f1 and
one by (f2, f3). This might be a good application for an in-memory sqlite
database, which can maintain indexes for you.

If the number of items is not too large, you could just implement this as a
list of (f1, f2, f3, f4) tuples and repeated searches through the list, but
that will get slow quickly as the number of items being added grows.

Kent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100217/f7006bfe/attachment.htm>


More information about the Tutor mailing list