set using alternative hash function?

Austin Bingham austin.bingham at gmail.com
Thu Oct 15 07:52:36 EDT 2009


That's definitely a workable solution, but it still rubs me the wrong
way. The uniqueness criteria of a set seems, to me, like a property of
the set, whereas the python model forces it onto each set element.

Another issue I have with the HashWrapper approach is its space
requirements. Logically, what I'm asking to do is switch out a single
function reference (i.e. to point at get_name() rather than hash()),
but in practice I'm forced to instantiate a new object for each of my
set members. On a large set, this could be disastrous.

Don't get me wrong...your solution is a good one, but it's just not
what I am looking for.

Austin

On Thu, Oct 15, 2009 at 1:36 PM, Chris Rebert <clp2 at rebertia.com> wrote:
> On Thu, Oct 15, 2009 at 4:24 AM, Austin Bingham
> <austin.bingham at gmail.com> wrote:
>> If I understand things correctly, the set class uses hash()
>> universally to calculate hash values for its elements. Is there a
>> standard way to have set use a different function? Say I've got a
>> collection of objects with names. I'd like to create a set of these
>> objects where the hashing is done on these names. Using the __hash__
>> function seems inelegant because it means I have to settle on one type
>> of hashing for these objects all of the time, i.e. I can't create a
>> set of them based on a different uniqueness criteria later. I'd like
>> to create a set instance that uses, say, 'hash(x.name)' rather than
>> 'hash(x)'.
>>
>> Is this possible? Am I just thinking about this problem the wrong way?
>> Admittedly, I'm coming at this from a C++/STL perspective, so perhaps
>> I'm just missing the obvious. Thanks for any help on this.
>
> You could use wrapper objects that define an appropriate __hash__():
>
> #*completely untested*
> class HashWrapper(object):
>    def __init__(self, obj, criteria):
>        self._wrapee = obj
>        self._criteria = criteria
>
>    #override __hash__() / hash()
>    def __hash__(self):
>        return hash(self._criteria(self._wrapee))
>
>    #proxying code
>    def __getattr__(self, name):
>        return getattr(self._wrapee, name)
>
>    def __setattr__(self, name, val):
>        setattr(self._wrapee, name, val)
>
> #example
> def name_of(obj):
>    return obj.name
>
> def name_and_serial_num(obj):
>    return obj.name, obj.serial_number
>
> no_same_names = set(HashWrapper(obj, name_of) for obj in some_collection)
> no_same_name_and_serial = set(HashWrapper(obj, name_and_serial_num)
> for obj in some_collection)
>
> Cheers,
> Chris
> --
> http://blog.rebertia.com
>



More information about the Python-list mailing list