How to subclass sets.Set() to change intersection() behavior?

Gabriel Genellina gagsl-py at yahoo.com.ar
Tue Dec 12 23:46:53 EST 2006


At Tuesday 12/12/2006 23:23, mkppk wrote:

>I have kind of strange change I'd like to make to the sets.Set()
>intersection() method..
>
>Normally, intersection would return items in both s1 and s2 like with
>something like this:  s1.intersection(s2)
>
>I want the item matching to be a bit "looser".. that is, items in s2
>that match to just the beginning of items in s1 would be included in
>the result of intersection().
>
>I do not know how intersection() is implemented, so I just kinda
>guessed it might have something to do with how it compares set
>elements, probably using __eq__ or __cmp__. SO, I though if I override
>these methods, maybe magically that would affect the way intersection
>works.. so far, no luck =(

You got it the wrong way... That methods are used to compare two 
sets, not to compare their elements.
You don't have to modify set behavior, instead, you should modify how 
the set elements compare themselves. That is, you should inherit from 
str and implement some "fuzzy comparison" logic.

>- the lists I am working with are small, like 1-10 items each

For such small lists, perhaps the best way is to iterate along both 
lists, like in your example. But replace x.startswith(y) with 
x[:len(y)]==y which is faster. Also, don't you have to test the other 
way too? y.startswith(x)

># this is the list implementation I am trying to avoid because I am
>under the impression using set  would be faster..(??)
># please let me know if I am wrong about that assumption
>
>L1=['macys','installment','oil','beans']
>L2=['macy','oil','inst','coffee']
>
>L3=[]
>for x in L1:
>         for y in L2:
>                 if x.startswith(y):
>                         L3.append(y)
>
># prints ['macy', 'inst', 'oil']
>print L3

You can use the timeit module to measure performance.

Just for fun -because I don't think it would be better for small sets 
as you have- this is an implementation of a "fuzzystring" class which 
only compares its first character.

class fuzzystr(str):

     """A fuzzy string. Only takes its first character into account 
when comparing.
     That is, fuzzystr('abc')==fuzzystr('add')"""

     def __cmp__(self, other):
         if not isinstance(other, basestring): return -1 # always < 
any other thing
         if not self: return len(other) and -1 or 0
         if not other: return 1
         return cmp(self[0], other[0])

     def __eq__(self, other): return self.__cmp__(other)==0
     def __ne__(self, other): return self.__cmp__(other)!=0
     def __lt__(self, other): return self.__cmp__(other)<0
     def __le__(self, other): return self.__cmp__(other)<=0
     def __gt__(self, other): return self.__cmp__(other)>0
     def __ge__(self, other): return self.__cmp__(other)>=0

     def __hash__(self):
         # This must hold for all instances: x==y => hash(x)==hash(y)
         if self: return hash(self[0])
         return hash('')

try: set
except NameError: from sets import Set as set

s1=set(map(fuzzystr,['macys','installment','oil','beans']))
s2=set(map(fuzzystr,['macy','oil','inst','coffee']))
assert s1.intersection(s2) == set(map(fuzzystr,['macy','inst','oil']))


-- 
Gabriel Genellina
Softlab SRL 

__________________________________________________
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis! 
¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar



More information about the Python-list mailing list