aggregation for a nested dict

Thu Dec 2 15:40:17 EST 2010

On 12/02/2010 01:49 PM, MRAB wrote:
> On 02/12/2010 19:01, chris wrote:
>> i would like to parse many thousand files and aggregate the counts for
>> the field entries related to every id.
>>
>> extract_field grep the identifier for the fields with regex.
>>
>> result = [ { extract_field("id", line) : [extract_field("field1",
>> line),extract_field("field2", line)]}  for line  in FILE ]
>>
>> i like to aggregate them for every line or maybe file and get after
>> the complete parsing procedure
>>
>> {'a: {'0':2, '84':2}}
>> {'b': {'1000':1,'83':1,'84':1} }

I'm not sure what happened to b['0'] based on your initial data, 
but assuming that was an oversight...

> from collections import defaultdict
>
> aggregates = defaultdict(lambda: defaultdict(int))
> for entry in result:
>       for key, values in entry.items():
>           for v in values:
>               aggregates[key][v] += 1

Or, if you don't need the intermediate result, you can tweak 
MRAB's solution and just iterate over the file(s):

   aggregates = defaultdict(lambda: defaultdict(int))
   for line in FILE:
     key = extract_field("id", line)
     aggregates[key][extract_field("field1", line)] += 1
     aggregates[key][extract_field("field2", line)] += 1

or, if you're using an older version (<2.5) that doesn't provide 
defaultdict, you could do something like

   aggregates = {}
   for line in FILE:
     key = extract_field("id", line)
     d = aggregates.setdefault(key, {})
     for fieldname in ('field1', 'field2'):
       value = extract_field(fieldname, line)
       d[value] = d.get(value, 0) + 1

-tkc