Efficient checksum calculating on lagre files
Christos TZOTZIOY Georgiou
tzot at sil-tec.gr
Thu Feb 10 13:33:24 EST 2005
On 09 Feb 2005 10:31:22 GMT, rumours say that Nick Craig-Wood
<nick at craig-wood.com> might have written:
>Fredrik Lundh <fredrik at pythonware.com> wrote:
>> on my machine, Python's md5+mmap is a little bit faster than
>> subprocess+md5sum:
>>
>> import os, md5, mmap
>>
>> file = open(fn, "r+")
>> size = os.path.getsize(fn)
>> hash = md5.md5(mmap.mmap(file.fileno(), size)).hexdigest()
>>
>> (I suspect that md5sum also uses mmap, so the difference is
>> probably just the subprocess overhead)
>
>But you won't be able to md5sum a file bigger than about 4 Gb if using
>a 32bit processor (like x86) will you? (I don't know how the kernel /
>user space VM split works on windows but on linux 3Gb is the maximum
>possible size you can mmap.)
Indeed... but the context was calculating efficiently checksums for large files
to be /served/ by a webserver. I deduce it's almost certain that the files
won't be larger than 3GiB, but ICBW :)
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...
More information about the Python-list
mailing list