Pure Python Data Mangling or Encrypting

Sat Jun 27 01:47:22 EDT 2015

On Fri, Jun 26, 2015 at 9:38 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> With respect Randall, you contradict yourself. Is there any wonder that some
> of us (well, me at least) is suspicious and confused, when your story
> changes as often as the weather?
>
> Sometimes you say that the client software uses AES encryption. Sometimes
> you say that you don't want to use AES encryption because you want the
> client to be pure Python, and a pure-Python implementation would be too
> slow. Your very first post says:
>
>     My original idea was for the recipient to encrypt using AES.  But
>     I want to keep this software pure Python "batteries included" and
>     not require installation of other platform-dependent software.
>     Pure Python AES and even DES are just way too slow.

In the context of the initial post, this was referring to the data
mangling done by the receiver; it has no bearing on the form of the
data sent by the application.

> Sometimes you say the user is supposed to encrypt the data themselves:
>
>     While the data senders are supposed to encrypt data, that's not
>     guaranteed

Whereas this clearly describes the behavior of the application itself.

> Now you say that the application encrypts the data, except that the user can
> turn that option off.
>
> Just make the AES encryption mandatory, not optional. Then the user cannot
> upload unencrypted malicious data, and the receiver cannot read the data.
> That's two problems solved.

And what if somebody else writes a competing version of the client
software that doesn't bother with the encryption step at all? The
point was that while encryption is expected, it cannot be assumed by
the receiver, and in fact if the data is actually malicious, then it
likely is not even being sent by the client software in the first
place.

> If the app does encrypt the data with AES before sending, then you don't
> gain any benefit by obfuscating an encrypted file with a classical
> monoalphabetic substitution cipher.

Only if the recipient can *trust* the sender to have performed the
encryption, which it can't, no matter how mandatory the OP tries to
make it.

> Suppose that you hire an intern to write the "choose key" function, and not
> knowing any better, he simply iterates through the keys in numeric order,
> one after the other. So the first upload will use key 0, the second key 1,
> the third key 2, and so on, until key 256! - 1, then start again. In that
> case, predicting the next key is *trivial*. If I can work out what key you
> send now (I just upload a file containing "\x00\x01\x02...\xFF" to myself
> and see what I get), then I know what key the app will use next.

If you upload a file to yourself, the result that you get will have no
bearing on what key might be chosen when you upload a file to somebody
else.

> Even if I can't do that, I might be able to guess the seed: I know what time
> the application started up, to within a few milliseconds,

How?

> and I know (or
> can guess) how many random numbers you have used,

How?

> Except... you're getting your random numbers from a system *I* control.

No you don't. If you did already control the target system, then as
already suggested, you have no need to attack the data upload; you can
just write whatever data you want to disk. This is like suggesting
that the sudoers file is insecure because a user with root access
would be able to add themselves to it.

>> If the attacker controlled the machine the app was on, why would it fool
>> with /dev/urandom?  I think he'd just plant the files he wanted to plant
>> and be done.  This is non-nonsensical anyway.
>
> No, you don't understand the nature of the attack. In this scenario, the
> sender is the attacker. I want to upload malicious files to the receiver.
> You are trying to stop me, that's the whole point of "mangling or
> encrypting" the files. (Your words.) So I, the sender, prepare a file such
> that when you mangle it, the resulting mangled content is the malicious
> content I want.
>
> If you use a substitution cipher, I can do this if I can guess or force the
> key. If you use strong crypto, I can't.
>
> However, I can hack the application. The client sits on my computer, it's
> pure Python, even if it isn't I can still hack the application, I don't
> need access to the source code.

If the recipient system is using the system random to generate the
key, then you can hack the application all you want, and it will give
you precisely zero information about the state of the entropy pool on
the remote system.

> Yes. Do you think that's hard for an attacker who has access to your
> application, possibly including the source code, and controls all the
> sources of entropy on the system your application is running on?
>
> I don't have to *randomly* guess. I control what time your application
> starts, I control what randomness you get from /dev/urandom, I control how
> many keys you go through, I might even be able to read the source code of
> the application (not that I need to, that just makes it easier).

I think what's going on here is that you've missed the point that the
obfuscation is done by the recipient, not by the sender. The sender
has no control over any of those things, and no access to either the
key or the obfuscated data unless they can gain it through some other
attack vector (which is certainly a factor to consider before
implementing this).