binary file comparison with the md5 module

Remco Gerlich scarblac at pino.selwerd.nl
Thu Jun 14 02:30:02 EDT 2001


Christian Reyes <christian at rocketnetwork.com> wrote in comp.lang.python:
> I'm trying to write a script that takes two binary files and returns whether
> or not their data is completely matching.
> 
> One of my peers suggested that an efficient way to do this would be to run
> the md5 algorithm on each file and then compare the resultant output.  Since
> md5 returns a unique 128-bit checksum of it's input, this should
> theoretically work.
> 
> The problem i'm having is with reading the binary file in as a string.
> 
> I tried opening the file with the built-in python open command, and then
> reading the contents of the file into a buffer.  But I think my problem is
> that when I read the binary file into a buffer, the contents get tweaked
> somehow.  I would expect the print statement to give me some huge string of
> gibberish but instead what I get is 'RIFFnap'.  Regardless of what size the
> file is.  I'll try to read in a 5 meg file and all I get when I try to print
> the buffer is some variation of 'RIFFxxx' (where xxx is any arbitrary set of
> 3 characters).
> 
> >>> x = open('d:\\binary.wav')

You need to open the file in binary mode:

x = open("d:\\binary.wav", "rb")

-- 
Remco Gerlich



More information about the Python-list mailing list