[Cryptography-dev] Create Fernet API allowing streaming encryption and decryption from file-like objects.

Mon Jan 19 16:51:19 CET 2015

> On Jan 19, 2015, at 10:19 AM, Michael Iverson <dr.michael.iverson at gmail.com> wrote:
> 
> Hello, 
> 
> I'm new to the cryptography library, and I'm definitely excited about having a well-reviewed cryptographic library with a simple API. 
> 
> However, I'm noticing that there is area that might improve the usefulness of the high level methods. The Fernet API is where my interest lies, as I'm presently in need of a symmetric algorithm. However, the idea could be equally applicable for other APIs.
> 
> The fernet API presently requires that a complete buffer containing the plaintext or cyphertext be passed to the encryption or decryption methods.  This requirement becomes memory inefficient for moderately sized objects, and can prevent processing of large objects entirely, especially on memory constrained systems.
> 
> Furthermore, many python libraries use file handles as an abstraction for incrementally consuming or producing data.  Examples include http responses in Tornado and Cyclone, and the SFTP interface in paramiko. 
> 
> I'd like to propose the addition of an alternate API that would accept and return file handles, and incrementally encrypt or decrypt using the handles. I think this would make the library more useful for a variety of solutions, enhancing adoption. 
> 
> On the surface, it appears that the main cryptographic primitives, (hmac, padding, aes, etc.) are designed to operate on in an incremental fashion, using the update() method to incrementally compute data, and finalize() to return the final result, so the change may not be overly difficult. 
> 
> I'm willing to contribute the code for such an endeavor, as I'm going to write it anyway for a current project. Contributing the code will help ensure it it is adequately reviewed. 
> 
> Does anyone feel this would be a worthwhile improvement?

The problem with streaming APIs (and why the recipes layer doesn’t currently have anything to work with them) is that it gives data to the user prior to it being authenticated.

For example, let’s say someone modifies the first chunk of a big file that is encrypted, and you do something like:

with open(“decrypted.txt”, “wb”) as d_fp:
    with open(“encrypted”, “rb”) as s_fp:
        for chunk in cryptography.decrypt_file(s_fp):
            s_fp.write(chunk)

This is a fairly obvious way of handling that. However it’ll write a whole bunch of data to decrypted.txt and only fail after the very last chunk. Software or humans might not notice that and start operating on that data even though it’s been authenticated (perhaps instead of writing to a file it’s writing to stdout and you’re using it in a unix pipe or something). The attractive-ness of the current API for fernet is that you either get back completely authenticated data or you don’t so there is no “oops an error happened but I didn’t notice and operated on data that was invalid”.

As you noticed, this starts to break down for objects that are large, especially ones that are so large they cannot be reasonable held completely in memory. Currently, due to the above issues, handling large objects requires using the hazmat API because we haven’t yet come up with a “safe” way of handling large objects and have it implemented. One likely candidate is to do something similar to what TLS does, and that is break the large objects up into chunks of a certain size, and basically encrypt and authenticate each chunk separately. Since each chunk operates independently, you essentially are calling the “one shot” APIs for each chunk, and only hand back each chunk to the caller of the API when it’s been fully authenticated. This allows us to control how much we need to buffer in memory at any one time (basically the size of the chunk).This would also need some higher level authentication across all of the chunks or a serial number or something that will ensure that chunks don’t arrive out or order or missing a particular chunk.

Ideally we’d use something like the above scheme that is backed by some sort of standard, whether an RFC or some thing that doesn’t involve us creating our own format. However it might be the case that we need to create our own format, and if we do then we should attempt to make that a standard and get it reviewed by cryptographers to make sure that we aren’t forgetting something.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cryptography-dev/attachments/20150119/ce75cfa5/attachment-0001.html>