[Cryptography-dev] Reducing the impact of Backends on higher level APIs.

Thu May 22 18:12:59 CEST 2014

So I have let this marinate in my brain all night and my only response is: you’ve convinced me.

On May 21, 2014 at 7:04:23 PM, David Reid (dreid at dreid.org) wrote:

In PyCA's Cryptography the backend is the object responsible for providing one
of many lowest common denominator APIs to binding specific interfaces.

``CipherBackend`` for example provides methods for interogating the support
ciphers, and creating encryption or decryption ``CipherContext`` objects. It
notably does NOT contain any methods for encrypting or decrypting bytes, it
instead returns a ``CipherContext`` provider, whose interface is designed to
ONLY care about transforming bytes. It does not even indicate in which direction
bytes should be transformed.

This makes it possible to construct APIs which are not aware of our
backend/binding infrastructure at all, and to simply provide atlernative methods
for getting ``CipherContext`` objects.

For instance a trivial implementation of a ``ROT13CipherContext`` is:

.. code-block:: python

    import codecs

    class ROT13CipherContext(object):
        def update(self, data):
            str_data = data.decode('ascii')
            rot13 = codecs.encode(str_data, 'rot13')
            return rot13.encode('ascii')

        def finalize():
            return b""

This ``CipherContext`` provider can be passed to any code that expects a
``CipherContext`` and no backends need to be provided.

This is desirable because the backends are a lowest common denominator and are
now free from the burden of being indefinitely extensible, after all nothing
says that only a ``CipherBackend`` can produce a ``CipherContext``. A more
realistic example of using this capability is PyNACL, which has support for
symmetric encryption but does not expose algorithms or modes and so does not fit
easily into the ``Cipher(algorithm, mode, backend)`` API.

Instead PyNACL could expose a new object, ``SecretCipher(key, nonce)``

.. code-block:: python

    class SecretCipherContext(object):
        def __init__(self, operator, nonce):
            self._operator = operator
            self._nonce = nonce
            self._message = []

        def update(self, data):
            self._message.append(data)
            return b""

        def finalize(self):
            return self._operator(b"".join(self._message), self._nonce)

    class SecretCipher(object):
        def __init__(self, key, nonce):
            self._box = SecretBox(key)
            self._nonce = nonce

        def encryptor(self):
            return SecretCipherContext(self._box.encrypt, self._nonce)

        def decryptor(self):
            return SecretCipherContext(self._box.decrypt, self._nonce)

It would be cumbersome to try to fit this construction into `CipherBackend`,
requiring a custom backend that provided synthetic algorithm and mode objects
which are only used for activating the NaCL backend. Or a `NaCLBackend`
interface which is effectively implemented as above despite there being no other
providers.

However the above demonstrates that there is still a compatible API that PyNaCl
can expose to allow it to be a drop-in replacement for ``CipherContext`` objects
created by the ``CipherBackend``.

The conclusion I've drawn from this is that not every feature requires
interfaces at both the primitives and the backend layer. Perhaps more
importantly not every feature SHOULD provide interfaces at both the primitives
and the backend layer, something we have already demonstrated in our PKCS7
padding implementation.

At this point we must consider the application of these ideas to the body of
Asymmetric Backends. The ``AsymmetricSignatureContext`` and the
``AsymmetricVerificationContext`` interfaces are obviously analagous to the
``CipherContext`` above, it is clear how they might be easily used in code that
is both unaware of any particular backend interfaces and unaware of the
particular asymmetric algorithms being used, and how alternative mechanisms for
producing these providers might not involve a backend at all.

In fact PyNaCl could easily expose NaCl's signature APIs as the above contexts
with little trouble and no need for a separate backend layer (implementation
left as an exercise for the reader).

To reiterate, the important property is not that any specific provider is
unaware of backends, but that the interfaces do not force any particular object
to be concerned about the existence of backends. Knowledge of backends stays
firmly rooted in the construction of a few objects which produce the objects
that perform actual operations, and not all producers of the operation contexts
need to know about backends.

It is worth noting at this point that with the exception of ``RSAPrivateKey``,
``RSAPublicKey``, and the DSA equivalents, no methods exist in the primitive
interfaces that take a backend argument. This startling departure would seem to
indicate a problem with our understanding of the purpose of objects that provide
this interface.

``RSABackend`` specifically exposes a set of methods whose only job is to be
called from a concrete provider of the ``RSAPrivateKey`` to produce the above
``*Context`` interfaces.

I would propose instead that operations in our asymmetric interfaces can be
separated into two categories, operations you invoke to get a key (such as
generation or deserialization) and operations that you invoke with a key (such
as acquiring a signature or verification contexts and serialization).

Further I would suggest that only operations in the first category ever need to
potentially be aware of a backend.  Operations performed with a key in contrast
only ever need to know that they have a key.  The fact that the key may have
originated from a particular backend does not affect the user (except in that
some algorithms or serialization formats may not be supported by all backends,
and therefor all keys produces by all backends.)

Ignoring for a moment the representation of keys as python integers, and the
ability to use a key produced by one backend with another, the RSA interfaces
could begin to look like this::

    RSABackend
     |
     +-generate_rsa_key(public_exponent, key_size) -> RSAPrivateKey

    RSAPrivateKey
     |
     +-public_key() -> RSAPublicKey
     |
     +-signer(padding, algorithm) -> AsymmetricSignatureContext
     |
     +-decrypt(ciphertext, padding) -> bytes

    RSAPublicKey
     |
     +-verifier(padding, algorithm) -> AsymmetricVerificationContext
     |
     +-encrypt(plaintext, padding) -> bytes

Now you can see that you only need the backend for getting the key, and the
beyond that it is assumed you have a backend specific ``RSAPrivateKey`` provider
that just uses that backend to perform the desired operations.

Other mechanisms for getting a private key for a specific backend would be
deserialization, which I imagine would be achieved by a specific backend
provider, implementing one of several asymmetric algorithm agnostic backend
interfaces, such as a ``PKCS8Backend`` or the
``TraditionalOpenSSLSerializationBackend``.

As for the handling of key serialization I think it is best treated as an
operation you perform with a key, and therefor a method on a particular key
provider communicated via an optional key interface.

For instance since the OpenSSL backend supports both RSA and PKCS8 it would
produce objects that provide both the ``RSAPrivateKey`` interface, and some
PKCS8 specific interface like::

    PKCS8Key
     |
     +-pkcs8_der() -> bytes
     |
     +-pkcs8_pem(passphrase, algorithm, mode) -> bytes

Another proposal for serialization APIs might be that there are functions for
performing serialization that take keys, and that keys would then need a
standard interface for accessing their raw data (likely as python integers)  and
it means that at serialization time you would need access to a backend. Either
for constructing a serializer object, or being passed directly to the  function
for serializing the data alongside the key.

Much more discussion needs to happen about the exact details of any
serialization API, I currently favor the ``PKCS8Key`` API however.

I'd now like to address the ability to use a key on multiple backends, which was
one of the driving factors (in addition to the test vectors) behind our earlier
decision to include the python integers that make up the components of an RSA
key on our RSA specific interfaces.

It is not clear to me that there is actual value in such a feature. So I would
like to officially propose that we do not plan to support it in the near future.

However it is clear that the way decided to represent keys to hypothetically
support that feature has made some things more complicated. It has resulted in
many more methods on the backend (because a key once loaded or generated, is now
unrelated to a backend) and it has resulted in a proliferation of properties on
the key interface to expose the key as python integers for the purpose of later
conversion to backend specific objects.

This repeated conversion has both performance and security implications and  is
not trivially mitigated without violating the key abstraction by attaching
private content to it or by utilizing a weakref cache of keys to backend
specific objects.

I believe the benefits of avoiding this repeated conversion and simplifying the
backend APIs by removing extra methods, and removing backend awareness from the
private key, outweight the immediate downsides of losing the ability to use a
key loaded with one backend to perform an operation on another.

Transitioning to backend specific keys and removing the properties that expose
key components as integers from the key objects also has the side effect of
making the ``RSAPrivateKey`` the lowest common denominator interface for a keys
stored in memory and keys potentially stored in some HSM or other agent that
does not provide direct access to key components.

It is however still useful to take a series of integers and load them as a key.
Most immediately it is useful for the test vectors which are in a
non-standardized  format that provides little value outside of testing multiple
backend implementations of the algorithms. To facilitate this testing, it seems
clear that any ``RSABackend`` would need to be able to create a backend specific
keys from these numbers. However this situations is complicated by the fact that
not all backend providers use the same set of numbers to represent a key,
meaning significant transformation may need to be done by the backend to turn
the set of numbers in the vectors into the set of numbers that can be loaded by
the  backend, or that the backend may need to convert the numbers to some
intermediate standard serialization that the backend under test already knows
how to load.

I believe it's important to support this representation and that it by treating
it as a (de)serialization problem we can solve it relatively easily in the above
framework (incidentally this may provide a path towards supporting multi-backend
supporting keys but I don't want to focus on that).

What is needed to support these is:

1. An in memory representation of the key components.
2. A backend method for loading this representation into a backend specific key.
3. An optional method for converting a backend specific key to this representation.

In a series of pull requests I've started to lay out a migration path towards this
format however the desired end result is somewhat obscured by the method of
preserving compatibility.

So to lay out what I expect will be the end result:

An in memory representation::

    RSAPrivateNumbers
     |
     +-public_numbers() -> RSAPublicNumbers
     |
     +-<attributes for accessing individual components>

    RSAPublicNumbers
     |
     +-<attributes for accessing individual components>

It seems necessary given the reliance of our testing infrastructure on this
representation that you should always be able to load it, thus it seems like
it should be part of the previously specified ``RSABackend``::

    RSABackend
     |
     +-load_rsa_numbers(rsa_numbers) -> RSAPrivateKey | RSAPublicKey

I've chosen to support loading both objects in one method for the symmetry
with other deserialization APIs where you do not necessarily know if you have
a private or a public key.  However it seems harder to avoid knowing that you
have an RSA key so perhaps this attempt at preserving symmetry is in vain.

And much like the above ``PKCS8Key`` proposal it is expected that an ``RSABackend``
will sometimes produce keys which can be turned back into this format (though it is
not required as in the case of supporting an HSM.)::

    RSANumbersKey
     |
     +-rsa_numbers() -> RSAPrivateNumbers | RSAPublicNumbers

Though my previous pull request specified too interfaces, I think I am leaning
more towards one interface and one method that returns either a public
or private representation for symmetry with both the loading API and the
other serialization format proposal.

I have no spent nearly 6 hours writing this email.  I hope it either
successfully convinces everyone who wishes to be involved or that it is a useful
starting point for further conversation.

More specifically I hope I have effectively communicated:

1) Why I think backend specific keys are desirable.
2) How I think they should be exposed to the user.
3) How this approach is more consistent with our other interfaces.
4) Why these properties are desirable in our other interfaces and should
   be mimiced in our asymmetric interface.
5) That backends are a necessary public api for cryptography to function,
   but an implementation detail of cryptography that is not necessary for
   useful levels of compatibility with cryptography.

-David
_______________________________________________  
Cryptography-dev mailing list  
Cryptography-dev at python.org  
https://mail.python.org/mailman/listinfo/cryptography-dev  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/cryptography-dev/attachments/20140522/2822f9a0/attachment-0001.html>