checksum problem

Chris Angelico rosuav at gmail.com
Tue Jan 30 14:37:09 EST 2018


On Wed, Jan 31, 2018 at 6:21 AM, Peter Pearson
<pkpearson at nowhere.invalid> wrote:
> On Tue, 30 Jan 2018 11:24:07 +0100, jak <please at nospam.tnx> wrote:
>>      with open(fname, "rb") as fh:
>>          for data in fh.read(m.block_size * blocks):
>>              m.update(data)
>>      return m.hexdigest()
>>
>
> I believe your "for data in fh.read" loop just reads the first block of
> the file and loops over the bytes in that block (calling m.update once
> for each byte, probably the least efficient approach imaginable),
> omitting the remainder of the file.  That's why you start getting the
> right answer when the first block is big enough to encompass the whole
> file.

Correct analysis.

Generally, if you want to read a file in chunks, the easiest way is this:

while "moar data":
    data = fh.read(block_size)
    if not data: break
    m.update(data)

That should get you the correct result regardless of your block size,
and then you can tweak the block size to toy with performance.

ChrisA



More information about the Python-list mailing list