Efficient checksum calculating on lagre files

Christos TZOTZIOY Georgiou tzot at sil-tec.gr
Thu Feb 10 13:33:24 EST 2005


On 09 Feb 2005 10:31:22 GMT, rumours say that Nick Craig-Wood
<nick at craig-wood.com> might have written:

>Fredrik Lundh <fredrik at pythonware.com> wrote:
>>  on my machine, Python's md5+mmap is a little bit faster than
>>  subprocess+md5sum:
>> 
>>      import os, md5, mmap
>> 
>>      file = open(fn, "r+")
>>      size = os.path.getsize(fn)
>>      hash = md5.md5(mmap.mmap(file.fileno(), size)).hexdigest()
>> 
>>  (I suspect that md5sum also uses mmap, so the difference is
>>  probably just the subprocess overhead)
>
>But you won't be able to md5sum a file bigger than about 4 Gb if using
>a 32bit processor (like x86) will you?  (I don't know how the kernel /
>user space VM split works on windows but on linux 3Gb is the maximum
>possible size you can mmap.)

Indeed... but the context was calculating efficiently checksums for large files
to be /served/ by a webserver.  I deduce it's almost certain that the files
won't be larger than 3GiB, but ICBW :)
-- 
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...



More information about the Python-list mailing list