[Tutor] request for comments regarding my function

Rich Lovely roadierich at googlemail.com
Sun Oct 25 15:09:04 CET 2009


2009/10/25 Isaac <hyperneato at gmail.com>:
> Hello all,
> I wrote a function to organize a list of sorted objects ( django blog
> entries ); each object has a Python datetime attribute ( called
> pub_date ). I am posting a message to the tutor mailing list to get
> feedback regarding better ways to accomplish the same task. I have a
> feeling I could use recursion here, instead of looping through the
> entire _query list 3 times. Here is my code:
>
> def create_archive_list(_query):
>    """
>    given a sorted queryset, return a list in the format:
>
>    archive_list = [{'2009': [{'December': ['entry1', 'entry2']},
>                              {'January': ['entry3', 'entry4']}
>                              ]
>                     },
>
>                    {'2008': [{'January': ['entry5', 'entry6']},
>                              ]
>                     }
>                    ]
>    """
>
>    archive_list = []
>    tmp_year_list = []
>
>    # make a list of Python dictionaries; the keys are the year, as in
> '2009', and the value
>    # is an empty list.
>    for item in _query:
>        if item.pub_date.year not in tmp_year_list:
>            tmp_year_list.append(item.pub_date.year)
>            tmp_dict = {}
>            tmp_dict[item.pub_date.year] = []
>            archive_list.append(tmp_dict)
>        else:
>            pass
>
>    # for every list in the archive_list dictionaries, append a
> dictionary with the
>    # blog entry's month name as a key, and the value being an empty list
>    for entry in _query:
>        for item in archive_list:
>            _year = entry.pub_date.year
>            if _year in item:
>                _tmp_month = entry.pub_date.strftime("%B")
>                # make a dictionary with the month name as a key and
> an empty list as a value
>                tmp_dict = {}
>                tmp_dict[_tmp_month] = []
>
>                if tmp_dict not in item[_year]:
>                    item[_year].append(tmp_dict)
>
>    # append the blog entry object to the list if the pub_date month
> and year match
>    # the dictionary keys in the archive_list dictionary/list tree.
>    for entry in _query:
>        # dict in list
>        for item in archive_list:
>            # year of entry
>            _year = entry.pub_date.year
>            if _year in item:
>                _year_list = item[_year]
>                for _dict_month in _year_list:
>                    _tmp_month = entry.pub_date.strftime("%B")
>                    if _tmp_month in _dict_month:
>                        _dict_month[_tmp_month].append(entry)
>
>    return archive_list
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>

Instant first comment, after just a glance at your code, Can't you use
an int for the year and months, rather than strings?  (This is a
principle of the MVC (Model, View, Controller) structure:  you keep
your model (The actual data) as easy to handle as possible, and using
ints as keys makes conversion between the query and the output and
sorting easier.  Conversion from {2009:{12: ...}} to "December 2009:
..." should be handled by the view: either the GUI, or immediatly
before it's displayed.  (This may be a point of contention, it could
be argued that the view should only display it, and all manipulation
should be handled by the controller level.)

This isn't a case for recursion:  recursion is best used when you have
one thing containing a container, which contains a container, which
contains another, which contains another, etc, to (usually) unknown
depths.  (Yes, there are numeric uses for recursion, but they are
usually best replaced with an iterative approach).

It would be more pythonic to format your output as a dict of dicts (of
lists), rather than lists of single-item dicts of lists of single-item
dicts:
archive_list = \
{2009: {12: ['entry1', 'entry2'],
             1: ['entry3', 'entry4']}
            },
'2008': {1: ['entry5', 'entry6']}
}

You loose the ordering, but that isn't a problem when your keys have
implicit ordering.

Look up collections.defaultdict.  To generate this structure, you can
use dict as the factory function with defaultdict:

d = collections.defaultdict(dict)

or, more usefully,
d = collections.defaultdict(functools.partial(collections.defaultdict, list))
This will give a defaultdict of defaultdicts of lists.  (The
functools.partial just returns a callable, which is what defaultdict
needs to create its default value.

Then, as you iterate through the report, you simply assign to the dict
as follows:

for entry in month_record:
    d[year][month].append(entry)

It avoids all the messing around with tmp_year_list and so on.

Another way, using only builtins, is to use dict.setdefault():

d = {}
d.setdefault(year, {}).setdefault(month, [])

This is essentially what the defaultdict does for you.

Doing it like this will give you the advantage of being easy to read:
I'm struggling to follow what your code does at the moment:  you've
got far too many levels of nesting and temporary variables.

I don't know the structure of your query result, but using pseudo-functions:

import collections, functools

def create_archive_list(query):
    d = collections.defaultdict(functools.partial(collections.defaultdict,
list))
    for entry in query: #iterate over blog posts, rather than years,
months etc - therefore using a single pass
        d[entry.year()][entry.month()].append(entry.data())
    return d

And finally, a little utility function, which DOES use recursion:

def mapToDict(d):
    """"recursively convert defaultdicts into regular dicts"""
    d = dict(d)
    d.update((k, map_to_dict(v)) for k,v in d.items() if isinstance(v,
collections.defaultdict))
    return d


This function is more for personal comfort than regular use (although
it does clean up debugging output):  it's poor programming if code
that works with a regular dict fails if given a defaultdict.  If you
DO discover any, I highly recommend reporting it as a bug.

Like I said, I've not seen your data structure, so I'm making some big
guesses here, but this should give you enough to look into towards
improving your code.

-- 
Rich "Roadie Rich" Lovely

There are 10 types of people in the world: those who know binary,
those who do not, and those who are off by one.


More information about the Tutor mailing list