bags? 2.5.x?

MRAB google at mrabarnett.plus.com
Tue Jan 22 15:45:05 EST 2008


On Jan 21, 11:13 pm, Dan Stromberg <dstrombergli... at gmail.com> wrote:
> On Thu, 17 Jan 2008 18:18:53 -0800, Raymond Hettinger wrote:
> >> >> I keep wanting something like them - especially bags with something
> >> >> akin to set union, intersection and difference.
>
> >> > How about this recepie
> >> > <URL:http://www.ubookcase.com/book/Oreilly/
>
> >> The author of the bag class said that he was planning to submit bags
> >> for inclusion in 2.5 - is there a particular reason why they didn't go
> >> in?
>
> > Three reasons:
>
> > 1. b=collections.defaultdict(int) went a long way towards meeting the
> > need to for a fast counter.
>
> > 2. It's still not clear what the best API would be. What should list(b)
> > return for b.dict = {'a':3, 'b':0, 'c':-3}? Perhaps, [('a', 3), ('b',
> > 0), ('c', -3)] or ['a', 'a', 'a']
> > or ['a']
> > or ['a', 'b', 'c']
> > or raise an Error for the negative entry.
>
> I'd suggest that .keys() return the unique list, and that list() return
> the list of tuples.  Then people can use list comprehensions or map() to
> get what they really need.

I think that a bag is a cross between a dict (but the values are
always positive integers) and a set (but duplicates are permitted).

I agree that .keys() should the unique list, but that .items() should
return the tuples and list() should return the list of keys including
duplicates. bag() should accept an iterable and count the duplicates.

For example:

>>> sentence = "the cat sat on the mat"
>>> my_words = sentence.split()
>>> print my_words
['the', 'cat', 'sat', 'on', 'the', 'mat']
>>> my_bag = bag(my_words)
>>> print my_bag
bag({'on': 1, 'the': 2, 'sat': 1, 'mat': 1, 'cat': 1})
my_list = list(my_bag)
['on', 'the', 'the', 'sat', 'mat', 'cat']

It should be easy to convert a bag to a dict and also a dict to a bag,
raising ValueError if it sees a value that's not a non-negative
integer (a value of zero just means "there isn't one of these in the
bag"!).

>
> It might not be a bad thing to have an optional parameter on __init__
> that would allow the user to specify if they need negative counts or not;
> so far, I've not needed them though.
>
> > 3. I'm still working on it and am not done yet.
>
> Decent reasons.  :)
>
> Thanks!
>
> Here's a diff to bag.py that helped me.  I'd like to think these meanings
> are common, but not necessarily!
>
> $ diff -b /tmp/bag.py.original /usr/local/lib/bag.py
> 18a19,58
>
> >       def _difference(lst):
> >               left = lst[0]
> >               right = lst[1]
> >               return max(left - right, 0)
> >       _difference = staticmethod(_difference)
>
> >       def _relationship(self, other, operator):
> >               if isinstance(other, bag):
> >                       self_keys = set(self._data.keys())
> >                       other_keys = set(other._data.keys())
> >                       union_keys = self_keys | other_keys
> >                       #print 'union_keys is',union_keys
> >                       result = bag()
> >                       for element in list(union_keys):
> >                               temp = operator([ self[element], other
> [element] ])
> >                               #print 'operator is', operator
> >                               #print 'temp is', temp
> >                               result[element] += temp
> >                       return result
> >               else:
> >                       raise NotImplemented
>
> >       def union(self, other):
> >               return self._relationship(other, sum)
>
> >       __or__ = union
>
> >       def intersection(self, other):
> >               return self._relationship(other, min)
>
> >       __and__ = intersection
>
> >       def maxunion(self, other):
> >               return self._relationship(other, max)
>
> >       def difference(self, other):
> >               return self._relationship(other, self._difference)
>
> >       __sub__ = difference



More information about the Python-list mailing list