[Tutor] Using a dictionary to keep a count

Sat Aug 2 12:39:01 2003

Hello all,

I am trying to write a script that will read a log
file and count the number of times each CGI script has
been accessed by a particular site. I am using a
dictionary to keep the counts per script, but there
must be a logic error since the final counts are
wrong. Can somebody please point me to the right
direction? I can make it work using a List approach
but I believe that a dictionary will be more
efficient.

Here are the sample input file(access_log) and the
Python script(logParse.py):

############## access_log ####################### 
alpha.umn.edu --[24/Feb/1997:09:03:50 -0700] "POST
/cgi-bin/script1.cgi HTTP /
alpha.umn.edu --[24/Feb/1997:09:04:15 -0700] "POST
/cgi-bin/script1.cgi HTTP /
mcgraw.com --[24/Feb/1997:09:04:22 -0700] "POST
/cgi-bin/script2.cgi HTTP /
rohcs.ats.com --[24/Feb/1997:09:04:34 -0700] "POST
/cgi-bin/script2.cgi HTTP /
rohcs.ats.com --[24/Feb/1997:09:04:34 -0700] "POST
/cgi-bin/script1.cgi HTTP /
idg.com --[24/Feb/1997:09:05:35 -0700] "POST
/cgi-bin/script2.cgi HTTP /
##################################################

################################ logParse.py
######################################
# logParse.py
# Test program read a log file and track the number of
times some scripts have been accessed 
# by specific sites.
# This is the dictionary/dictionary approach.
###############

# This function is used to extract the script name and
the address from each line. Since the
# structure of the line is well defined, this can be
done without using regular expressions. 
# Sample line:
# alpha.umn.edu --[24/Feb/1997:09:03:50 -0700] "POST
/cgi-bin/script1.cgi HTTP /

def process_line(aline):
 	address = aline.split('--')[0].strip()  # split at
'--' and get first piece
 	rest = aline.split("POST")[1].strip()   # split at
'POST' to get the distal portion.
 	script = rest.split()[0]                # split that
piece at the whitespace before 'HTTP'
 	                                        # and the
script is the first fragment.
 	return (script, address)                # return
both elements in a tuple     

####################### Main Program
#######################

# Initialize a dictionary that will contain the
scripts as keys and the sites as values.
dict ={} 

# Read the file line by line and extract the script
name and the address.
for line in file('access_log'):
    script, address = process_line(line)   
    # Add the script to the dictionary as a key if not
already in there, otherwise just add
    # the address as a value in a dictionary for the
key:
    dict.setdefault(script, {})[address] = 1

    # if the address had already been seen for the
same key, we must increment its count:
    if address in dict[script].keys():
        dict[script][address] += 1

# Now print the dictionary to verify what we have
done:

for key in dict.keys():
    print key, " => "
    for value in dict[key].keys():
        print value,":", dict[key][value]

    print "\n"    
################################### End
###################################

Thanks,
Levy Lazarre        

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com