custom classes in sets

Carl Banks invalidemail at aerojockey.com
Mon Feb 14 05:17:32 EST 2005


vegetax wrote:
> Steven Bethard wrote:
>
> > vegetax wrote:
> >> How can i make my custom class an element of a set?
> >>
> >> class Cfile:
> >>   def __init__(s,path): s.path = path
> >>
> >>   def __eq__(s,other):
> >>    print 'inside equals'
> >>    return not os.popen('cmp %s %s' % (s.path,other.path)).read()
> >>
> >>   def __hashcode__(s): return s.path.__hashcode__()
> >>
> >> the idea is that it accepts file paths and construct a set of
unique
> >> files (the command "cmp" compares files byte by byte.),the files
can
> >> have different paths but the same content
> >>
> >> but the method __eq__ is never called

[snip]

> I just tried and it wont be called =(, so how can i generate a hash
code for
> the CFile class? note that the comparitions(__eq__) are done based on
the
> contents of a file using the command 'cmp', i guess thats not posible
but
> thanks.


Let me suggest that, if your idea is to get a set of files all with
unique file contents, comparing a file byte-by-byte with each file
already in the set is going to be absurdly inefficient.

Instead, I recommend comparing md5 (or sha) digest.  The idea is, you
read in each file once, calculate an md5 digest, and compare the
digests instead of the file contents.

. import md5
.
. class Cfile:
.     def __init__(self,path):
.         self.path = path
.         self.md5 = md5.new().update(open(path).read()).digest()
.     def __eq__(self,other):
.         return self.md5 == other.md5
.     def __hash__(self):
.         return hash(self.md5)

This is kind of hackish (not to mention untested).  You would probably
do better to mmap the file (see the mmap module) rather than read it.

And, in case you're wondering: yes it is theoretically possible for
different files to have the same md5.  However, the chances are
microscopic.  (Incidentally, the SCons build system uses MD5 to decide
if a file has been modified.)


-- 
CARL BANKS




More information about the Python-list mailing list