Help wanted with md2 hash algorithm

Tom Anderson twic at urchin.earth.li
Tue Jan 10 13:13:59 EST 2006


On Sun, 8 Jan 2006, Tom Anderson wrote:

> On Fri, 6 Jan 2006 wjb131 at web.de wrote:
>
>> below you find my simple python version of MD2 algorithm as described 
>> in RFC1319 (http://rfc1319.x42.com/MD2). It produces correct results 
>> for strings shorter than 16 Bytes and wrong results for longer strings.
>
> I guess the thing to do is extract the C code from the RFC and compile 
> it, verify that it works, then stick loads of print statements in the C 
> and the python, to see where the states of the checksum engines diverge.

Okay, i've done this. I had to fiddle with the source a bit - added a 
#include "global.h" to md2.h (it needs it for the PROTO_LIST macro) and 
took the corresponding includes out of md2c.c and mddriver.c (to avoid 
duplicate definitions) - but after that, it built cleanly with:

gcc -DMD=2 *.c *.h -o mddriver

A couple of pairs of (somewhat spurious) parentheses in mddriver.c, and it 
even built cleanly with -Wall.

Running the test suite with mddriver -x gives results matching the test 
vectors in the RFC - a good start!

Patching the code to dump the checksums immediately after updating with 
the pad, and before updating with the checksum:

*** checksum after padding = 623867b6af52795e5f214e9720beea8d
MD2 ("") = 8350e5a3e24c153df2275c9f80692773
*** checksum after padding = 19739cada3ba281693348e9d256fff31
MD2 ("a") = 32ec01ec4a6dac72c0ab96fb34c0b5d1
*** checksum after padding = 19e29d1b7304368e595a276f302f57cc
MD2 ("abc") = da853b0d3f88d99b30283a69e6ded6bb
*** checksum after padding = 56d65157dedfcd75a7b1e82d970eec4b
MD2 ("message digest") = ab4f496bfb2a530b219ff33031fe06b0
*** checksum after padding = 4a42d3a377b7e9988fb9289699e4d3a3
MD2 ("abcdefghijklmnopqrstuvwxyz") = 4e8ddff3650292ab5a4108c3aa47940b
*** checksum after padding = c3db7592ee1dd9b84505cfb4e2f9a765
MD2 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789") = da33def2a42df13975352846c30338cd
*** checksum after padding = 59ca5673c8f931bc41214f56b5c6c01
MD2 ("12345678901234567890123456789012345678901234567890123456789012345678901234567890") = d5976f79d83d3a0dc9806c3c66f3efd8

And here's my python code with the same modification, running the test 
suite:

*** checksum after padding =  623867b6af52795e5f214e9720beea8d
MD2 ("") = 8350e5a3e24c153df2275c9f80692773
*** checksum after padding =  19739cada3ba281693348e9d256fff31
MD2 ("a") = 32ec01ec4a6dac72c0ab96fb34c0b5d1
*** checksum after padding =  19e29d1b7304368e595a276f302f57cc
MD2 ("abc") = da853b0d3f88d99b30283a69e6ded6bb
*** checksum after padding =  56d65157dedfcd75a7b1e82d970eec4b
MD2 ("message digest") = ab4f496bfb2a530b219ff33031fe06b0
*** checksum after padding =  539ba695f264f365bcabc5c8b10913c7
MD2 ("abcdefghijklmnopqrstuvwxyz") = 65182bb8c569485fcba44dbc66a02b56
*** checksum after padding =  365fe0617f5f56a56090af1cfd6caac3
MD2 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789") = a1ccc835ea9654d6a2926c21f0b20813
*** checksum after padding =  9acf39425d22c4e3b4ddbdc563d23716
MD2 ("12345678901234567890123456789012345678901234567890123456789012345678901234567890") = 8f1f49dc8de490b9aa7c99cec3fbccdf

As you can see, the checksums start to go wrong when we hit 16 bytes.

So, let us turn our attention to the checksum function.

Here's the python i wrote:

def checksum_old(c, buf): # c is checksum array, buf is input block
 	l = c[-1]
 	for i in xrange(digest_size):
 		l = S[(buf[i] ^ l)]
 		c[i] = l

Here's the C from the RFC:

   unsigned int i, j, t;
   t = checksum[15];
   for (i = 0; i < 16; i++)
     t = checksum[i] ^= PI_SUBST[block[i] ^ t];

Spot the difference. Yes, the assignment into the checksum array is a ^=, 
not a straight = - checksum bytes get set to 
current-value-of-checksum-byte xor S-box-transformation-of (input-byte xor 
accumulator). Translating that into python, we get:

def checksum(c, buf):
 	l = c[-1]
 	for i in xrange(digest_size):
 		l = S[(buf[i] ^ l)] ^ c[i]
 		c[i] = l

And when we put that back into the code, we get the right digests out. 
Victory!

However, here's what the pseudocode in the RFC says:

          For j = 0 to 15 do
             Set c to M[i*16+j].
             Set C[j] to S[c xor L].
             Set L to C[j].
           end /* of loop on j */

I certainly don't see any sign of a xor with the 
current-value-of-checksum-byte in there - it looks like the C and 
pseudocode in the RFC don't match up.

And, yes, googling for "RFC 1319 errata" brings up a report correcting 
this. They really ought to amend RFCs to mention errata!

Correct code here:

http://urchin.earth.li/~twic/md2.py

tom

-- 
Mathematics is the door and the key to the sciences. -- Roger Bacon



More information about the Python-list mailing list