[Tutor] Counting and grouping dictionary values in Python 2.7

Bruce Dykes bkd69ster at gmail.com
Fri Jul 8 09:22:46 EDT 2016


I'm compiling application logs from a bunch of servers, reading the log
entries, parsing each log entry into a dictionary, and compiling all the
log entries into a single list of dictionaries. At present, all I'm doing
with it is writing the list of dictionaries to a .csv file, and to date,
we've been able to get by doing some basic analysis by simply using grep
and wc, but I need to do more with it now.

Here's what the data structures look like:

NY = ['BX01','BX02','BK01','MN01','SI01']
NJ = ['NW01','PT01','PT02']
CT = ['ST01','BP01','NH01']

sales = [
{'store':'store','date':'date','time':'time','state':'state',transid':'transid','product':'product','price':'price'},
{'store':'BX01','date':'8','time':'08:55','state':'NY',transid':'387','product':'soup','price':'2.59'},
{'store':'NW01','date':'8','time':'08:57','state':'NJ',transid':'24','product':'apples','price':'1.87'},
{'store':'BX01','date':'8','time':'08:56','state':'NY',transid':'387','product':'crackers','price':'3.44'}]

The first group of list with the state abbreviations is there to add the
state information to the compiled log, as it's not included in the
application log. The first dictionary in the list, with the duplicated key
names in the value field is there to provide a header line as the first
line in the compiled .csv file.

Now, what I need to do with this arbitrarily count and total the values in
the dictionaries, ie the total amount and number of items for transaction
id 387, or the total number of crackers sold in NJ stores. I think the
collections library has the functions I need, but I haven't been able to
grok the examples uses I've seen online. Likewise, I know I could build a
lot of what I need using regex and lists, etc, but if Python 2.7 already
has the blocks there to be used, well let's use the blocks then.

Also, is there any particular advantage to pickling the list and having two
files, one, the pickled file to be read as a data source, and the .csv file
for portability/readability, as opposed to just a single .csv file that
gets reparsed by the reporting script?

Thanks in advance
bkd


More information about the Tutor mailing list