Dict when defining not returning multi value key error

Chris Angelico rosuav at gmail.com
Fri Aug 1 22:13:27 EDT 2014


On Sat, Aug 2, 2014 at 11:47 AM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> - It relies on the checksum being unpredictable, to prevent substitution
> attacks: you're expecting object x with checksum a, but somebody
> substitutes object y with checksum a instead.

Note that this requirement is only an issue when there are actual
attacks involved. In many cases, hashing is used to detect either
errors in transfer (eg truncated files) or meaningfully different
files (eg MP3s of different songs). A collision between those won't be
the result of someone deliberately crafting a file with the right
checksum; it'll happen only by chance, so md5sum can be safely used
there.

But if you're trying to use this to prove that a file was downloaded
correctly, you do need to worry about that. If I say that you can
download the binary of my program from <this URL>, and that the MD5
checksum is 123456...DEF, then someone could do a DNS hack (cache
poisoning, proxy interference, whatever) to capture your attempted
download, and send you instead to his own server, where he has a
carefully crafted binary that does everything mine does, plus it tells
him all your passwords - and it has arbitrary junk buried in it to
make sure the MD5 sum matches. So an MD5 checksum is broken for
anything from the internet, but is quite usable for certain specific
cases.

There is one aspect of the unpredictability that's important even in
the simple cases, though, and that's the avalanche effect. If anything
changes in the file, the whole hash should completely and arbitrarily
change. That means you don't need to stare at the whole hash, trying
to see if that 8 became a 0; any change to the file will probably make
the first few digits visibly different, so it's easily obvious that
the hash has changed.

ChrisA



More information about the Python-list mailing list