md5 and large files

Nelson Minar nelson at monkey.org
Sun Oct 17 21:07:25 EDT 2004


Brad Tilley <rtilley at vt.edu> writes:
> I would like to verify that the files are not corrupt so what's the
> most efficient way to calculate md5 sums on 4GB files? The machine
> doing the calculations is a small desktop with 256MB of RAM.

If all you want to do is verify that a file is not corrupt, MD5 is the
wrong algorithm to use. Use something fast like crc32.

If you're worried about corruption anywhere in the file, then testing
the first 4k isn't going to help you very much.

If you really need it to be efficient, don't use Python. Use a native
program like md5sum or sum or something.

If this is you're homework, you'll learn a lot more by figuring it out
yourself.



More information about the Python-list mailing list