[PYTHON-CRYPTO] PyCrypto rant thread

Thu Oct 16 15:38:08 CEST 2008

On Oct 10, 2008, at 18:25 PM, Dwayne C. Litzenberger wrote:

> I'm sure there are a number of people on this list who are  
> currently _not_ using PyCrypto.  I know PyCrypto has problems, but  
> as its new maintainer, it would be useful for me to find out  
> exactly what problems others have had with it, so I can figure out  
> what direction I should go with it.
>
> So, what do you hate about PyCrypto?

Hey, it's not often one gets an invitation like this!

I used PyCrypto for a couple of years, first for a prototype of Phil  
Zimmermann's Zfone project [1] and then in allmydata.org Tahoe [2].

The first thing that I didn't like was that CTR mode made a Python  
callback for each block, which was too slow and CPU intensive, so I  
wrote a patch to do CTR mode all in C code, and only with a simple  
incrementing counter, and I submitted that patch to AMK, who was then  
the ostensible maintainer of PyCrypto.

The second thing that I didn't like was that AMK never replied --  
PyCrypto was more or less unmaintained at that time.

The third thing that I didn't like was that my encrypted phone  
software didn't work when I tried to make a call one day.  It just so  
happened that on this day I was, for the first time, trying to make a  
call where one endpoint was on x86 CPU and the other endpoint was on  
amd64 CPU, although it didn't occur to me at first that this fact  
would be relevant.  I assumed that there was yet another bug in my  
setup/negotiation/streaming code, and spent a long time experimenting  
with it before I finally proved to myself that my code was doing the  
right thing and that only a bug in PyCrypto's SHA-256 implementation  
could explain the failure.  I ran PyCrypto's SHA-256 to produce the  
hashes of known inputs and sure enough, it was producing the wrong  
answers on my amd64 machine but the right ones on my x86 machine.

I investigated and figured out that PyCrypto had copied the SHA-256  
implementation from LibTomCrypt years earlier, and that the bug had  
subsequently been fixed in LibTomCrypt itself.

The fourth thing that I didn't like was that my secure distributed  
filesystem (Tahoe) had a mysterious unit test failure one day.  It  
turned out that the new version of PyCrypto's SHA-256 implementation,  
which was another, newer, copy from LibTomCrypt, had a *different*  
bug, one that caused incorrect output whenever the input was a  
certain length in bytes, modulo 64.  Again, this bug had since been  
fixed in LibTomCrypt, but the fix had not been copied back to PyCrypto.

At this point I gave up and switched away from using PyCrypto to  
writing my own Python package -- pycryptopp [3] -- which uses the  
Crypto++ library [4].  Crypto++ has many things to recommend it,  
especially the fact that it has thorough self-tests, so if there ever  
were a bug which caused incorrect output of a hash function on a new  
CPU architecture, or a bug which caused incorrect output of a hash  
function for certain input sizes, then it would be immediately  
detected the next time someone ran the tests, and the bug would not  
live long enough to worm its way into other projects.

In addition to the unit tests which come with the Crypto++ library  
itself, I also wrote fairly thorough unit tests myself for the  
pycryptopp wrapper code that I wrote.  I figured that even if Crypto+ 
+ were correct, a bug in my Python wrapper code could cause the  
resulting pycryptopp library to be incorrect, and this paranoia  
turned out to be well-founded, as I've already written at least two  
significant bugs in my pycryptopp wrapper code, both of which were  
quickly detected by running the pycryptopp self-tests.

(I say that I've written "at least" two significant bugs because the  
tests can't prove that I haven't written other bugs into the Python  
wrapper code or that the Crypto++ authors haven't written other bugs  
into the Crypto++ library itself -- the tests just caught the obvious  
ones -- incorrect output and segfault respectively.)

So the fifth and most important thing that I didn't like about  
PyCrypto was: insufficiently thorough tests and insufficiently  
careful quality control, e.g. copying code from LibTomCrypt and then  
not watching the LibTomCrypt project to see if that code was  
subsequently discovered to be buggy.

Note that Crypto++ itself has a pretty good track record of code  
quality.  It was also the first open source crypto library to be  
certified as FIPS 140-2 level 1 compliant (the second was OpenSSL).

Thanks for asking!  I feel much better having had a Rant with my  
morning coffee, and now I look forward to a happy day of trying to  
debug that damned seg fault.  :-)  You can see if I succeeded by  
watching the pycryptopp buildbot waterfall:

http://allmydata.org/buildbot-pycryptopp/waterfall?reload=60

If the tests all start passing, then the lights will all turn green.

Regards,

Zooko

[1] http://zfoneproject.org
[2] http://allmydata.org
[3] http://allmydata.org/trac/pycryptopp
[4] http://cryptopp.com
---
http://allmydata.org -- Tahoe, the Least-Authority Filesystem
http://allmydata.com -- back up all your files for $5/month