Nested dictionaries trouble

7stud bbxx789_05ss at yahoo.com
Wed Apr 11 17:08:35 EDT 2007


IamIan wrote:
> Hello,
>
> I'm writing a simple FTP log parser that sums file sizes as it runs. I
> have a yearTotals dictionary with year keys and the monthTotals
> dictionary as its values. The monthTotals dictionary has month keys
> and file size values. The script works except the results are written
> for all years, rather than just one year. I'm thinking there's an
> error in the way I set my dictionaries up or reference them...
>
> import glob, traceback
>
> years = ["2005", "2006", "2007"]
> months = ["01","02","03","04","05","06","07","08","09","10","11","12"]
> # Create months dictionary to convert log values
> logMonths =
> {"Jan":"01","Feb":"02","Mar":"03","Apr":"04","May":"05","Jun":"06","Jul":"07","Aug":"08","Sep":"09","Oct":"10","Nov":"11","Dec":"12"}
> # Create monthTotals dictionary with default 0 value
> monthTotals = dict.fromkeys(months, 0)
> # Nest monthTotals dictionary in yearTotals dictionary
> yearTotals = {}
> for year in years:
>   yearTotals.setdefault(year, monthTotals)
>
> currentLogs = glob.glob("/logs/ftp/*")
>
> try:
>   for currentLog in currentLogs:
>     readLog = open(currentLog,"r")
>     for line in readLog.readlines():
>       if not line: continue
>       if len(line) < 50: continue
>       logLine = line.split()
>
>       # The 2nd element is month, 5th is year, 8th is filesize
>       # Counting from zero:
>
>       # Lookup year/month pair value
>       logMonth = logMonths[logLine[1]]
>       currentYearMonth = yearTotals[logLine[4]][logMonth]
>
>       # Update year/month value
>       currentYearMonth += int(logLine[7])
>       yearTotals[logLine[4]][logMonth] = currentYearMonth
> except:
>   print "Failed on: " + currentLog
>   traceback.print_exc()
>
> # Print dictionaries
> for x in yearTotals.keys():
>   print "KEY",'\t',"VALUE"
>   print x,'\t',yearTotals[x]
>   #print "  key",'\t',"value"
>   for y in yearTotals[x].keys():
>     print "  ",y,'\t',yearTotals[x][y]
>
>
> Thank you,
> Ian


1) You have this setup:

    logMonths = {"Jan":"01", "Feb":"02",...}
    yearTotals = {
        "2005":{"01":0, "02":0, ....}
        "2006":
        "2007":
    }

Then when you get a  result such as "Jan", you look up  "Jan" in the
logMonths dictionary to get "01".  Then you use "01" and the year, say
"2005", to look up the value in the yearTotals dictionary.  What is
the point of even having the logMonths dictionary?  Why not make "Jan"
the key in the the "2005" dictionary and look it up directly:

yearTotals = {
"2005":{"Jan":0, "Feb":0, ....}
"2006":
"2007":
}

That way you could completely eliminate the lookup in the logMonths
dict.

2) In this part:

    logMonth = logMonths[logLine[1]]
    currentYearMonth = yearTotals[logLine[4]][logMonth]
    # Update year/month value
    currentYearMonth += int(logLine[7])
    yearTotals[logLine[4]][logMonth] = currentYearMonth

I'm not sure why you are using all those intermediate steps.  How
about:

    yearTotals[logLine[4]][logLine[1]] += int(logLine[7])

To me that is a lot clearer.  Or, you could do this:

    year, month, val = logLine[4], logLine[1], int(logLine[7])
    yearTotals[year][month] += val

3)
>I'm thinking there's an error in the way
>I set my dictionaries up or reference them

Yep.  It's right here:

for year in years:
  yearTotals.setdefault(year, monthTotals)

Every year refers to the same monthTotals dict.  You can use a dict's
copy() function to make a copy:

monthTotals.copy()

Here is a reworking of your code that also eliminates a lot of typing:

import calendar, pprint

years = ["200%s" % x for x in range(5, 8)]
print years

months = list(calendar.month_abbr)
print months

monthTotals = dict.fromkeys(months[1:], 0)
print monthTotals

yearTotals = {}
for year in years:
    yearTotals.setdefault(year, monthTotals.copy())
pprint.pprint(yearTotals)

logs = [
["", "Feb", "", "", "2007", "", "", "12"],
["", "Jan", "", "", "2005", "", "", "3"],
["", "Jan", "", "", "2005", "", "", "7"],
]

for logLine in logs:
    year, month, val = logLine[4], logLine[1], int(logLine[7])
    yearTotals[year][month] += val

for x in yearTotals.keys():
    print "KEY", "\t", "VALUE"
    print x, "\t", yearTotals[x]
    for y in yearTotals[x].keys():
        print "   ", y, "\t", yearTotals[x][y]




More information about the Python-list mailing list