[Tutor] Using a dictionary to keep a count
Levy Lazarre
llazarre@yahoo.com
Sat Aug 2 12:39:01 2003
Hello all,
I am trying to write a script that will read a log
file and count the number of times each CGI script has
been accessed by a particular site. I am using a
dictionary to keep the counts per script, but there
must be a logic error since the final counts are
wrong. Can somebody please point me to the right
direction? I can make it work using a List approach
but I believe that a dictionary will be more
efficient.
Here are the sample input file(access_log) and the
Python script(logParse.py):
############## access_log #######################
alpha.umn.edu --[24/Feb/1997:09:03:50 -0700] "POST
/cgi-bin/script1.cgi HTTP /
alpha.umn.edu --[24/Feb/1997:09:04:15 -0700] "POST
/cgi-bin/script1.cgi HTTP /
mcgraw.com --[24/Feb/1997:09:04:22 -0700] "POST
/cgi-bin/script2.cgi HTTP /
rohcs.ats.com --[24/Feb/1997:09:04:34 -0700] "POST
/cgi-bin/script2.cgi HTTP /
rohcs.ats.com --[24/Feb/1997:09:04:34 -0700] "POST
/cgi-bin/script1.cgi HTTP /
idg.com --[24/Feb/1997:09:05:35 -0700] "POST
/cgi-bin/script2.cgi HTTP /
##################################################
################################ logParse.py
######################################
# logParse.py
# Test program read a log file and track the number of
times some scripts have been accessed
# by specific sites.
# This is the dictionary/dictionary approach.
###############
# This function is used to extract the script name and
the address from each line. Since the
# structure of the line is well defined, this can be
done without using regular expressions.
# Sample line:
# alpha.umn.edu --[24/Feb/1997:09:03:50 -0700] "POST
/cgi-bin/script1.cgi HTTP /
def process_line(aline):
address = aline.split('--')[0].strip() # split at
'--' and get first piece
rest = aline.split("POST")[1].strip() # split at
'POST' to get the distal portion.
script = rest.split()[0] # split that
piece at the whitespace before 'HTTP'
# and the
script is the first fragment.
return (script, address) # return
both elements in a tuple
####################### Main Program
#######################
# Initialize a dictionary that will contain the
scripts as keys and the sites as values.
dict ={}
# Read the file line by line and extract the script
name and the address.
for line in file('access_log'):
script, address = process_line(line)
# Add the script to the dictionary as a key if not
already in there, otherwise just add
# the address as a value in a dictionary for the
key:
dict.setdefault(script, {})[address] = 1
# if the address had already been seen for the
same key, we must increment its count:
if address in dict[script].keys():
dict[script][address] += 1
# Now print the dictionary to verify what we have
done:
for key in dict.keys():
print key, " => "
for value in dict[key].keys():
print value,":", dict[key][value]
print "\n"
################################### End
###################################
Thanks,
Levy Lazarre
__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com