MD5 module Pythonicity

Leandro Lameiro lameiro at gmail.com
Sat Oct 15 01:23:13 EDT 2005


Hi folks

Recently I have been discussing with a friend about python ease of
use, and it is really good at this. This friend needed to calculate
the MD5 hash of some files and was telling me about the MD5 module.
The way he told me and how it is described in the Python Docs, the
method to calculate hashes did not seemed very pythonic to me, but it
was certainly very simple and easy:

The method is (taken from python official documentation):

>>> import md5
>>> m = md5.new()
>>> m.update("Nobody inspects")
>>> m.update(" the spammish repetition")
>>> m.digest()
'\xbbd\x9c\x83\xdd\x1e\xa5\xc9\xd9\xde\xc9\xa1\x8d\xf0\xff\xe9'

The idea to use this for files is: open file, take little chunks of
the file, call update for each one, and when you are done reading the
file, call digest. Well, OK, it is very simples and easy.
But wouldn't it be more pythonic if it did exist some kind of
md5.calculate_from_file("file") ?!
This way, you wouldn't have to split the file by yourself (this
function proposed would do this for you etc) and would make code a lot
more readable:

>>> import md5
>>> md5.calculate_from_file("/home/foo/bar.bz2")

or something like this. (Maybe passing to the md5 calculate_from_file
the open file object, instead of the string)

One alternative also shown in the documentation is to do everything at once:

>>> import md5
>>> md5.new("Nobody inspects the spammish repetition").digest()

Well, OK, this one is a bit more readable (it is not as good as I
think it could be), but has the disadvantage of having to load the
WHOLE file to memory.

What's wrong in having a function like the one I said, that would
split files for you, feed md5.update and, when it is over, return the
digest?
It is easier, doesn't require MD5 objects creation, works well on
small and big files, makes the code more readable and simple. Also,
calculating MD5 of files seems to be a common enough task to be put in
the library (well, at least on GNU/Linux we have one command just for
this - md5sum)

"Although practicality beats purity."
"Readability counts."
"Beautiful is better than ugly."

Have I got the wrong "Pythonic" definition?

--
Thanks in advance
Regards
Leandro Lameiro



More information about the Python-list mailing list