Help wanted with md2 hash algorithm
Tom Anderson
twic at urchin.earth.li
Tue Jan 10 13:13:59 EST 2006
On Sun, 8 Jan 2006, Tom Anderson wrote:
> On Fri, 6 Jan 2006 wjb131 at web.de wrote:
>
>> below you find my simple python version of MD2 algorithm as described
>> in RFC1319 (http://rfc1319.x42.com/MD2). It produces correct results
>> for strings shorter than 16 Bytes and wrong results for longer strings.
>
> I guess the thing to do is extract the C code from the RFC and compile
> it, verify that it works, then stick loads of print statements in the C
> and the python, to see where the states of the checksum engines diverge.
Okay, i've done this. I had to fiddle with the source a bit - added a
#include "global.h" to md2.h (it needs it for the PROTO_LIST macro) and
took the corresponding includes out of md2c.c and mddriver.c (to avoid
duplicate definitions) - but after that, it built cleanly with:
gcc -DMD=2 *.c *.h -o mddriver
A couple of pairs of (somewhat spurious) parentheses in mddriver.c, and it
even built cleanly with -Wall.
Running the test suite with mddriver -x gives results matching the test
vectors in the RFC - a good start!
Patching the code to dump the checksums immediately after updating with
the pad, and before updating with the checksum:
*** checksum after padding = 623867b6af52795e5f214e9720beea8d
MD2 ("") = 8350e5a3e24c153df2275c9f80692773
*** checksum after padding = 19739cada3ba281693348e9d256fff31
MD2 ("a") = 32ec01ec4a6dac72c0ab96fb34c0b5d1
*** checksum after padding = 19e29d1b7304368e595a276f302f57cc
MD2 ("abc") = da853b0d3f88d99b30283a69e6ded6bb
*** checksum after padding = 56d65157dedfcd75a7b1e82d970eec4b
MD2 ("message digest") = ab4f496bfb2a530b219ff33031fe06b0
*** checksum after padding = 4a42d3a377b7e9988fb9289699e4d3a3
MD2 ("abcdefghijklmnopqrstuvwxyz") = 4e8ddff3650292ab5a4108c3aa47940b
*** checksum after padding = c3db7592ee1dd9b84505cfb4e2f9a765
MD2 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789") = da33def2a42df13975352846c30338cd
*** checksum after padding = 59ca5673c8f931bc41214f56b5c6c01
MD2 ("12345678901234567890123456789012345678901234567890123456789012345678901234567890") = d5976f79d83d3a0dc9806c3c66f3efd8
And here's my python code with the same modification, running the test
suite:
*** checksum after padding = 623867b6af52795e5f214e9720beea8d
MD2 ("") = 8350e5a3e24c153df2275c9f80692773
*** checksum after padding = 19739cada3ba281693348e9d256fff31
MD2 ("a") = 32ec01ec4a6dac72c0ab96fb34c0b5d1
*** checksum after padding = 19e29d1b7304368e595a276f302f57cc
MD2 ("abc") = da853b0d3f88d99b30283a69e6ded6bb
*** checksum after padding = 56d65157dedfcd75a7b1e82d970eec4b
MD2 ("message digest") = ab4f496bfb2a530b219ff33031fe06b0
*** checksum after padding = 539ba695f264f365bcabc5c8b10913c7
MD2 ("abcdefghijklmnopqrstuvwxyz") = 65182bb8c569485fcba44dbc66a02b56
*** checksum after padding = 365fe0617f5f56a56090af1cfd6caac3
MD2 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789") = a1ccc835ea9654d6a2926c21f0b20813
*** checksum after padding = 9acf39425d22c4e3b4ddbdc563d23716
MD2 ("12345678901234567890123456789012345678901234567890123456789012345678901234567890") = 8f1f49dc8de490b9aa7c99cec3fbccdf
As you can see, the checksums start to go wrong when we hit 16 bytes.
So, let us turn our attention to the checksum function.
Here's the python i wrote:
def checksum_old(c, buf): # c is checksum array, buf is input block
l = c[-1]
for i in xrange(digest_size):
l = S[(buf[i] ^ l)]
c[i] = l
Here's the C from the RFC:
unsigned int i, j, t;
t = checksum[15];
for (i = 0; i < 16; i++)
t = checksum[i] ^= PI_SUBST[block[i] ^ t];
Spot the difference. Yes, the assignment into the checksum array is a ^=,
not a straight = - checksum bytes get set to
current-value-of-checksum-byte xor S-box-transformation-of (input-byte xor
accumulator). Translating that into python, we get:
def checksum(c, buf):
l = c[-1]
for i in xrange(digest_size):
l = S[(buf[i] ^ l)] ^ c[i]
c[i] = l
And when we put that back into the code, we get the right digests out.
Victory!
However, here's what the pseudocode in the RFC says:
For j = 0 to 15 do
Set c to M[i*16+j].
Set C[j] to S[c xor L].
Set L to C[j].
end /* of loop on j */
I certainly don't see any sign of a xor with the
current-value-of-checksum-byte in there - it looks like the C and
pseudocode in the RFC don't match up.
And, yes, googling for "RFC 1319 errata" brings up a report correcting
this. They really ought to amend RFCs to mention errata!
Correct code here:
http://urchin.earth.li/~twic/md2.py
tom
--
Mathematics is the door and the key to the sciences. -- Roger Bacon
More information about the Python-list
mailing list