Counter for items in lists in lists?

Sat Sep 25 05:11:52 EDT 2004

Charlotte Henkle <charlotte at fgm.com> wrote:

> Hello;
> 
> I'm pondering how to count the number of times an item appears in
> total in a nested list.  For example:
> 
> myList=[[a,b,c,d],[a,f,g,h],[a,b,x,y]]
> 
> I'd like to know that a appeared three times, and b appeared twice,
> and the rest appeard only once.

Two problems enmeshed, here: how to count items of various values if you
have a way to iterate over the values; and how to iterate on a nested
list.  You may code them together, but factoring them as two reusable
functions is preferable -- you'll end up with more reusable building
blocks than by coding them as one monolithic function.  Python is good
at allowing such factorings.  There may be a third sub-problem, about
how best to _present_ the counts -- hard to say from your statement
about the problem, whether that's a problem, but, see later.

You're not telling us what is the type of the items.  If they're strings
or numbers, basically anything that can be a key to a dictionary, the
counting problem is best solved by building a dictionary:

def itemcount(sequence):
    d = {}
    for item in sequence: d[item] = 1 + d.get(item, 0)
    return d

this kind of function is known as an 'accumulator': given a sequence it
loops over it computing/building some resulting object and returns it.
Not a specific Python term, but a nice one to keep in mind.

You don't clarify the structure of your list -- is it always two levels
as you present it, could it be more, are all items always going to be at
the same level or could some level have a mixture of items and sublists?
Taking the simplest hypothesis -- always two levels and all items at the
second level, the first one being made up of sublists only:

def itemseq(nested_list):
    for sublist in nested_list:
        for item in sublist: yield item

this kind of function is known as a 'simple generator': it produces a
sequence you may loop on with a for statement, pass to an accumulator,
etc.  This is a specific Python term, and the keyword that lets you tell
whether a function is a generator is 'yield'.

So, given these two functions -- or enhanced equivalents if your list or
its items don't meet these hypotheses -- the dict of counts is easy to
obtain:

myCounts = itemcount(itemseq(myList))

Now for the presentation issue -- you say you want to receive the
information "that a appeared three times, and b appeared twice,
and the rest appeard only once".  This appears to mean that you want to
get a sort of items by count, with items appearing only once just
summarized as a group, not listed individually.  If I correctly divined
your intention, this might help:

def summarize_counts(counts_dict):
    number_of_singletons = 0
    count_per_item_list = []
    for item, count in counts_dict.iteritems():
        if count == 1: number_of_singletons += 1
        else: count_per_item_list.append((count, item))
    count_per_item_list.sort()
    for count, item in count_per_item_list:
        print 'Item %r appeared %d times' % (item, count)
    print '%d items appeared only once each' % number_of_singletons

So, overall, if I've read your mind correctly,

    summarize_counts(itemcount(itemseq(myList)))

should do exactly what you required.  However, mindreading is always an
iffy business (that's part of why the Zen of Python says "explicit is
better than implicit"...:-).  So, if you can clarify the various
ambiguous and uncertain details of your specs, which forced me to guess
(violating another Zen injunction: "in the face of ambiguity, resist the
temptation to guess"), over and over, at what exactly you meant, I'm
confident that the various pieces presented here can be enhanced to meet
whatever your requirements actually _are_.

Alex