referencing a subhash for generalized ngram counting

Scott David Daniels Scott.Daniels at Acm.Org
Tue Nov 13 22:25:06 EST 2007


braver wrote:
> ...
> The real-life motivation for this is n-gram counting.  Say you want to
> maintain a hash for bigrams.  For each two subsequent words a, b in a
> text, you do
> bigram_count[a][b] += 1

This application is easily handed with tuples as keys.

     bigrams = {}
     src = iter(source)
     lag = src.next()
     for current in src:
         bigrams[lag, current] = bigrams.get((lag, current), 0) + 1
         lag = current

But if you really want nested:

     bigrams = {}
     src = iter(source)
     lag = src.next()
     for current in src:
         count = bigrams.setdefault(lag, {}).get(current, 0)
         bigrams[lag][current] = count + 1
         lag = current

-Scott David Daniels
Scott.Daniels at Acm.Org



More information about the Python-list mailing list