Pure Python Data Mangling or Encrypting

Steven D'Aprano steve at pearwood.info
Fri Jun 26 13:06:57 EDT 2015


On Fri, 26 Jun 2015 11:01 am, Ian Kelly wrote:

> You're making the same mistake that Steven did in misunderstanding the
> threat model.

I don't think I'm misunderstanding the threat, I think I'm pointing out a
threat which the OP is hoping to just ignore.

In an earlier post, I suggested that the threat model should involve at
least *three* different attacks, apart from the usual man-in-the-model
attacks of data in transit.

One is that the attacker is the person sending the data. E.g. I want to send
a nasty payload (say, malware, or an offensive image). Another is that the
attacker is the recipient of the file, who wants to read the sender's data.

As far as I can tell, the OP's plan to defend the sender's privacy is to
dump responsibility for encrypting the files in the sender's lap. As far as
I'm concerned, perhaps as many as one user in 20000 will pre-encrypt their
files. (Early adopters will be unrepresentative of the eventual user base
of this system. If this takes off, the user base will likely end up
dominated by people who think that "qwerty" is the epitome of unguessable
passwords.)

Users just don't use crypto unless their applications do it for them.

My opinion is that the application ought to do so, and not expect Aunt
Tillie to learn how to correctly use encryption software before uploading
her files. 

http://www.catb.org/jargon/html/A/Aunt-Tillie.html

It is the OP's prerogative to disagree, of course, but to me, if the OP's
app doesn't use strong crypto to encrypt users' data, that's tantamount to
saying they don't care about their users' data privacy. Using a
monoalphabetic substitution cipher to obfuscate the data is not strong
crypto.


> The goal isn't to prevent the attacker from working out 
> the key for a file that has already been obfuscated. Any real data
> that might be exposed by a vulnerability in the server is presumed to
> have already been strongly encrypted by the user.

I think that's a ridiculously unrealistic presumption, unless your user-base
is entirely taken from a very small subset of security savvy and
pedantically careful users.


> The goal is to prevent the attacker from guessing a key that hasn't
> even been generated yet, which could be exploited to engineer the
> obfuscated content into something malicious.

They don't need to predict the key exactly. If they can predict that the key
will be, lets say, one of these thousand values, then they can generate one
thousand files and upload them. One of them will match the key, and there's
your exploit. That's one attack.

A second attack is to force the key. The attacker controls the machine the
application is running on, they control /dev/urandom and can feed your app
whatever not-so-random numbers they like, so potentially they can force the
app to use the key of their choosing. Then they don't need 1000 files, they
just need one.

That's two. Does anyone think that I've thought of all the possible attacks?

(Well, hypothetical attacks. I acknowledge that I don't know the
application, and cannot be sure that it *actually is* vulnerable to these
attacks.)

The problem here is that a monoalphabetic substitution cipher is not
resistant to preimage attacks. Your only defence is that the key is
unknown. If the attacker can force the key, or predict the key, or guess a
small range of keys, they can exploit your weak cipher.

(Technically, "preimage attack" is usually used to refer to attacks on hash
functions. I'm not sure if the same name is used for attacks on ciphers.)

https://en.wikipedia.org/wiki/Preimage_attack

With a strong crypto cipher, there are no known preimage attacks. Even if
the attacker knows exactly what key you are using, they cannot predict what
preimage they need to supply in order to generate the malicious payload
they want after encryption. (As far as I know.)

That is the critical issue right there. The sort of simple monoalphabetic
substitution cipher using bytes.translate that the OP is using is
vulnerable to preimage attacks. Strong crypto is not.


> There are no 
> frequency-based attacks possible here, because you can't do frequency
> analysis on the result of a key that hasn't even been generated yet.

Frequency-based attacks apply to a different threat. I'm referring to at
least two different attacks here, with different attackers and different
victims. Don't mix them up.


> Assuming that you have no attack on the key generation itself, the

Not a safe assumption!


> best you can do is send a file deobfuscated with a random key and hope
> that the recipient randomly chooses the same key; the odds of that
> happening are 1 in 256!.

It's easy to come up with attacks which are no better than brute force. It's
the attacks which are better than brute force that you have to watch out
for.


-- 
Steven




More information about the Python-list mailing list