Calculate sha1 hash of a binary file

Nikolaus Rath Nikolaus at rath.org
Thu Aug 7 03:08:28 EDT 2008


LaundroMat <Laundro at gmail.com> writes:
> Hi -
>
> I'm trying to calculate unique hash values for binary files,
> independent of their location and filename, and I was wondering
> whether I'm going in the right direction.
>
> Basically, the hash values are calculated thusly:
>
> f = open('binaryfile.bin')
> import hashlib
> h = hashlib.sha1()
> h.update(f.read())
> hash = h.hexdigest()
> f.close()
>
> A quick try-out shows that effectively, after renaming a file, its
> hash remains the same as it was before.
>
> I have my doubts however as to the usefulness of this. As f.read()
> does not seem to read until the end of the file (for a 3.3MB file only
> a string of 639 bytes is being returned, perhaps a 00-byte counts as
> EOF?), is there a high danger for collusion?
>
> Are there better ways of calculating hash values of binary files?


Apart from opening the file in binary mode, I would consider to read
and update the hash in chunks of e.g. 512 KB. The above code is
probably going to perform horribly for sufficiently large files, since
you try read the entire file into memory.


Best,

   -Nikolaus

-- 
 »It is not worth an intelligent man's time to be in the majority.
  By definition, there are already enough people to do that.«
                                                         -J.H. Hardy

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C



More information about the Python-list mailing list