updating dictionaries from/to dictionaries

Mon Aug 11 22:26:44 EDT 2008

John:

> "append"? Don't you mean "add"???

Yes, that is what I meant, my apologies.

> What you need to do is practice translating from your
> requirements into Python, and it's not all that hard:
>
> "run a loop through foo" -> for key in foo:
> "match any of its keys that also exist in bar" -> if key in bar:
> "add those key's values in bar to the preexisting value for the
> corresponding key in foo" -> foo[key] += bar[key]

Due to my current level of numbskullery, when I start to see things
like tuples as keys, the apparent ease of this evaporates in front of
my eyes!  I know that I need more practice, though, and it will come.
>
> But you also need to examine your requirements:
> (1) on a mechanical level, as I tried to point out in my first
> response, if as you say all keys in bar are also in foo, you can
> iterate over bar instead of and faster than iterating over foo.
> (2) at a higher level, it looks like bar contains a key for every
> possible bigram, and you are tallying actual counts in bar, and what
> you want out for any bigram is (1 + number_of_occurrences) i.e.
> Laplace adjustment. Are you sure you really need to do this two-dict
> caper? Consider using only one dictionary (zot):
>
> Initialise:
> zot = {}
>
> To tally:
> if key in zot:
>    zot[key] += 1
> else:
>    zot[key] = 1
>
> Adjusted count (irrespective of whether bigram exists or not):
> zot.get(key, 0) + 1
>
> This method uses space proportional to the number of bigrams that
> actually exist. You might also consider collections.defaultdict, but
> such a dict may end up containing entries for keys that you ask about
> (depending on how you ask), not just ones that exist.

You are very correct about the Laplace adjustment.  However, a more
precise statement of my overall problem would involve training and
testing which utilizes bigram probabilities derived in part from the
Laplace adjustment; as I understand the workflow that I should follow,
I can't allow myself to be constrained only to bigrams that actually
exist in training or my overall probability when I run through testing
will be thrown off to 0 as soon as a test bigram that doesn't exist in
training is encountered.  Hence my desire to find all possible bigrams
in train (having taken steps to ensure proper set relations between
train and test).  The best way I can currently see to do this is with
my current two-dictionary "caper", and by iterating over foo, not
bar :)

And yes, I know it seems silly to wish for that document with the use-
cases, but personally speaking, even if the thing is rather lengthy, I
would probably pick up better techniques for general knowledge by
reading through it and seeing the examples.

I actually think that there would be a good market (if only in
mindshare) for a thorough examination of the power of lists, nested
lists, and dictionaries (with glorious examples) - something that
might appeal to a lot of non-full time programmers who need to script
a lot but want to be efficient about it, yet don't want to deal with a
tutorial that unnecessarily covers all the aspects of Python.  My
$0.027 (having gone up due to the commodities markets).

Thanks again for the input, I do appreciate it!

Brandon