Nested dictionaries trouble

Bruno Desthuilliers bdesth.quelquechose at free.quelquepart.fr
Wed Apr 11 16:57:05 EDT 2007


IamIan a écrit :
> Hello,
> 
> I'm writing a simple FTP log parser that sums file sizes as it runs. I
> have a yearTotals dictionary with year keys and the monthTotals
> dictionary as its values. The monthTotals dictionary has month keys
> and file size values. The script works except the results are written
> for all years, rather than just one year. I'm thinking there's an
> error in the way I set my dictionaries up or reference them...
> 
> import glob, traceback
> 
> years = ["2005", "2006", "2007"]
> months = ["01","02","03","04","05","06","07","08","09","10","11","12"]
> # Create months dictionary to convert log values
> logMonths =
> {"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May":"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09","Oct":"10","Nov":"11","Dec":"12"}

DRY violation alert !

logMonths = {
   "Jan":"01",
   "Feb":"02",
   "Mar":"03",
   "Apr":"04",
   "May":"05",
   #etc
}

months = sorted(logMonths.values())

> # Create monthTotals dictionary with default 0 value
> monthTotals = dict.fromkeys(months, 0)
> # Nest monthTotals dictionary in yearTotals dictionary
> yearTotals = {}
> for year in years:
>   yearTotals.setdefault(year, monthTotals)

A complicated way to write:
yearTotals = dict((year, monthTotals) for year in years)

And without even reading further, I can tell you have a problem here: 
all 'year' entry in yearTotals points to *the same* monthTotal dict 
instance. So when updating yearTotals['2007'], you see the change 
reflected for all years. The cure is simple: forget the monthTotals 
object, and define your yearTotals dict this way:

yearTotals = dict((year, dict.fromkeys(months, 0)) for year in years)

NB : for Python versions < 2.4.x, you need a list comp instead of a 
generator expression, ie:

yearTotals = dict([(year, dict.fromkeys(months, 0)) for year in years])

HTH



More information about the Python-list mailing list