Help wanted with md2 hash algorithm

Thu Jan 12 02:29:53 EST 2006

Tom Anderson wrote:
> On Sun, 8 Jan 2006, Tom Anderson wrote:
>
> > On Fri, 6 Jan 2006 wjb131 at web.de wrote:
> >
> >> below you find my simple python version of MD2 algorithm as described
> >> in RFC1319 (http://rfc1319.x42.com/MD2). It produces correct results
> >> for strings shorter than 16 Bytes and wrong results for longer strings.
> >
> > I guess the thing to do is extract the C code from the RFC and compile
> > it, verify that it works, then stick loads of print statements in the C
> > and the python, to see where the states of the checksum engines diverge.
>
> Okay, i've done this. I had to fiddle with the source a bit - added a
> #include "global.h" to md2.h (it needs it for the PROTO_LIST macro) and
> took the corresponding includes out of md2c.c and mddriver.c (to avoid
> duplicate definitions) - but after that, it built cleanly with:
>
> gcc -DMD=2 *.c *.h -o mddriver
>
> A couple of pairs of (somewhat spurious) parentheses in mddriver.c, and it
> even built cleanly with -Wall.
>
> Running the test suite with mddriver -x gives results matching the test
> vectors in the RFC - a good start!
>
> Patching the code to dump the checksums immediately after updating with
> the pad, and before updating with the checksum:
>
> *** checksum after padding = 623867b6af52795e5f214e9720beea8d
> MD2 ("") = 8350e5a3e24c153df2275c9f80692773
> *** checksum after padding = 19739cada3ba281693348e9d256fff31
> MD2 ("a") = 32ec01ec4a6dac72c0ab96fb34c0b5d1
> *** checksum after padding = 19e29d1b7304368e595a276f302f57cc
> MD2 ("abc") = da853b0d3f88d99b30283a69e6ded6bb
> *** checksum after padding = 56d65157dedfcd75a7b1e82d970eec4b
> MD2 ("message digest") = ab4f496bfb2a530b219ff33031fe06b0
> *** checksum after padding = 4a42d3a377b7e9988fb9289699e4d3a3
> MD2 ("abcdefghijklmnopqrstuvwxyz") = 4e8ddff3650292ab5a4108c3aa47940b
> *** checksum after padding = c3db7592ee1dd9b84505cfb4e2f9a765
> MD2 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789") = da33def2a42df13975352846c30338cd
> *** checksum after padding = 59ca5673c8f931bc41214f56b5c6c01
> MD2 ("12345678901234567890123456789012345678901234567890123456789012345678901234567890") = d5976f79d83d3a0dc9806c3c66f3efd8
>
> And here's my python code with the same modification, running the test
> suite:
>
> *** checksum after padding =  623867b6af52795e5f214e9720beea8d
> MD2 ("") = 8350e5a3e24c153df2275c9f80692773
> *** checksum after padding =  19739cada3ba281693348e9d256fff31
> MD2 ("a") = 32ec01ec4a6dac72c0ab96fb34c0b5d1
> *** checksum after padding =  19e29d1b7304368e595a276f302f57cc
> MD2 ("abc") = da853b0d3f88d99b30283a69e6ded6bb
> *** checksum after padding =  56d65157dedfcd75a7b1e82d970eec4b
> MD2 ("message digest") = ab4f496bfb2a530b219ff33031fe06b0
> *** checksum after padding =  539ba695f264f365bcabc5c8b10913c7
> MD2 ("abcdefghijklmnopqrstuvwxyz") = 65182bb8c569485fcba44dbc66a02b56
> *** checksum after padding =  365fe0617f5f56a56090af1cfd6caac3
> MD2 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789") = a1ccc835ea9654d6a2926c21f0b20813
> *** checksum after padding =  9acf39425d22c4e3b4ddbdc563d23716
> MD2 ("12345678901234567890123456789012345678901234567890123456789012345678901234567890") = 8f1f49dc8de490b9aa7c99cec3fbccdf
>
> As you can see, the checksums start to go wrong when we hit 16 bytes.
>
> So, let us turn our attention to the checksum function.
>
> Here's the python i wrote:
>
> def checksum_old(c, buf): # c is checksum array, buf is input block
>  	l = c[-1]
>  	for i in xrange(digest_size):
>  		l = S[(buf[i] ^ l)]
>  		c[i] = l
>
> Here's the C from the RFC:
>
>    unsigned int i, j, t;
>    t = checksum[15];
>    for (i = 0; i < 16; i++)
>      t = checksum[i] ^= PI_SUBST[block[i] ^ t];
>
> Spot the difference. Yes, the assignment into the checksum array is a ^=,
> not a straight = - checksum bytes get set to
> current-value-of-checksum-byte xor S-box-transformation-of (input-byte xor
> accumulator). Translating that into python, we get:
>
> def checksum(c, buf):
>  	l = c[-1]
>  	for i in xrange(digest_size):
>  		l = S[(buf[i] ^ l)] ^ c[i]
>  		c[i] = l
>
> And when we put that back into the code, we get the right digests out.
> Victory!
>
> However, here's what the pseudocode in the RFC says:
>
>           For j = 0 to 15 do
>              Set c to M[i*16+j].
>              Set C[j] to S[c xor L].
>              Set L to C[j].
>            end /* of loop on j */
>
> I certainly don't see any sign of a xor with the
> current-value-of-checksum-byte in there - it looks like the C and
> pseudocode in the RFC don't match up.
>
> And, yes, googling for "RFC 1319 errata" brings up a report correcting
> this. They really ought to amend RFCs to mention errata!
>
> Correct code here:
>
> http://urchin.earth.li/~twic/md2.py
>
> tom
>
> --
> Mathematics is the door and the key to the sciences. -- Roger Bacon

Hi Tom,

thank you very much for your analysis and your solution. Great.
(My knowledge of C language is not that good.)
Wolfgang