[Tutor] regarding checksum

Peter Otten __peter__ at web.de
Wed Oct 26 04:34:41 EDT 2016


Clayton Kirkwood wrote:

> Small problem:
> Import zlib
> For file in files:
>     checksum = zlib.adler32(file)
> 
> traceback
>     checksum = zlib.adler32(file)
> TypeError: a bytes-like object is required, not 'str'
> 
> Obvious question, how do I make a bytes-like object. I've read through the
> documentation and didn't find a way to do this.

A checksum is calculated for a sequence of bytes (numbers in the range 
0...255), but there are many ways to translate a string into such a byte 
sequence. As an example let's convert "mañana" first using utf-8,

>>> list("mañana".encode("utf-8"))
[109, 97, 195, 177, 97, 110, 97]

then latin1:

>>> list("mañana".encode("latin-1"))
[109, 97, 241, 97, 110, 97]

So which sequence should the checksum algorithm choose?
Instead of picking one at random it insists on getting bytes and requires 
the user to decide:

>>> zlib.adler32("mañana".encode("utf-8"))
238748531
>>> zlib.adler32("mañana".encode("latin1"))
178062064

However, your for loop

> For file in files:
>     checksum = zlib.adler32(file)

suggests that you are interested in the checksum of the files' contents. To 
get the bytes in the file you have to read the file in binary mode:

>>> files = "one", "two"
>>> for file in files:
...     with open(file, "rb") as f:
...         print(zlib.adler32(f.read()))
... 
238748531
178062064




More information about the Tutor mailing list