[Tutor] Careful Dictionary Building

Michael Langford mlangford.cs03 at gtalumni.org
Fri Dec 28 18:49:00 CET 2007


This functionality already exists in the ever so useful defaultdict object.
You pass a factory method to the constructor of defaultdict for an object,
and it returns a new object when there is no key:

from collections import defaultdict
mydict = defaultdict(list)
for record in mylist:
        mydict[ record[0] ].append( record )

defaultdict is usually good enough for datasets I've used it for.

         --Michael



On 12/28/07, doug shawhan <doug.shawhan at gmail.com> wrote:
>
> *sigh* Ignore folks. I had forgotten about .has_key().
>
>
>
> On Dec 28, 2007 11:22 AM, doug shawhan <doug.shawhan at gmail.com> wrote:
>
> > I'm building a dictionary from a list with ~ 1M records.
> >
> > Each record in the list is itself a list.
> > Each record in the list has a line number, (index 0) which I wish to use
> > as a dictionary key.
> >
> > The problem: It is possible for two different records in the list to
> > share this line number. If they do, I want to append the record to the value
> > in the dictionary.
> >
> > The obvious (lazy) method of searching for doubled lines requires
> > building and parsing a key list for every record. There must be a better
> > way!
> >
> > dict = {}
> > for record in list
> >     if record[0] in dict.keys ():
> >         dict[ record[0] ].append( record )
> >     else:
> >         dict[ record[0] ] = [record]
> >
> > Once you get ~ 80,000 records it starts slowing down pretty badly (I
> > would too ...).
> >
> > Here's hoping there is a really fast, pythonic way of doing this!
> >
>
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>


-- 
Michael Langford
Phone: 404-386-0495
Consulting: http://www.RowdyLabs.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20071228/1bc904aa/attachment.htm 


More information about the Tutor mailing list