checksum problem

Tue Jan 30 14:21:42 EST 2018

On Tue, 30 Jan 2018 11:24:07 +0100, jak <please at nospam.tnx> wrote:
> Hello everybody,
> I'm using python 2.7.14 and calculating the checksum with the sha1 
> algorithm and this happens: the checksum is wrong until I read the whole 
> file in one shot. Here is a test program:
>
> import hashlib
>
> def Checksum(fname, blocks):
>      m = hashlib.sha1()
>      print "sha1 block size: " + str(m.block_size * blocks)
>      with open(fname, "rb") as fh:
>          for data in fh.read(m.block_size * blocks):
>              m.update(data)
>      return m.hexdigest()
>
> def main():
>      for b in range(10, 260, 10):
>          print str(b) + ': ' + 
> Checksum("d:/upload_688df390ea0bd728fdbeb8972ae5f7be.zip", b)
>
> if __name__ == '__main__':
>      main()
>
> and this is the result output:
>
> sha1 block size: 640
> 10: bf09de3479b2861695fb8b7cb18133729ef00205
> sha1 block size: 1280
> 20: 71a5499e4034fdcf0eb0c5d960c8765a8b1f032d
> .
> .
> .
> sha1 block size: 12160
> 190: 956d017b7ed734a7b4bfdb02519662830dab4fbe
> sha1 block size: 12800
> 200: 1b2febe05b70f58350cbb87df67024ace43b76e5
> sha1 block size: 13440
> 210: 93832713edb40cf4216bbfec3c659842fbec6ae4
> sha1 block size: 14080
> 220: 93832713edb40cf4216bbfec3c659842fbec6ae4
> .
> .
> .
>
> the file size is 13038 bytes and its checksum is 
> 93832713edb40cf4216bbfec3c659842fbec6ae4
>
> Why do I get these results? What am I doing wrong?
>
> Thanks to everyone in advance.

I believe your "for data in fh.read" loop just reads the first block of
the file and loops over the bytes in that block (calling m.update once
for each byte, probably the least efficient approach imaginable),
omitting the remainder of the file.  That's why you start getting the
right answer when the first block is big enough to encompass the whole
file.

-- 
To email me, substitute nowhere->runbox, invalid->com.