Pure Python Data Mangling or Encrypting

Fri Jun 26 16:09:51 EDT 2015

On 06/26/2015 12:06 PM, Steven D'Aprano wrote:
> On Fri, 26 Jun 2015 11:01 am, Ian Kelly wrote:
>
>> You're making the same mistake that Steven did in misunderstanding the
>> threat model.
>
> I don't think I'm misunderstanding the threat, I think I'm pointing out a
> threat which the OP is hoping to just ignore.

I'm not hoping to ignore anything.  I didn't explain the entire system, 
as it was not necessary to find a solution to the problem at hand.  But 
since you want to make negative assumptions about what I didn't tell 
you, I'll gladly address your accusations of negligence.

>
> In an earlier post, I suggested that the threat model should involve at
> least *three* different attacks, apart from the usual man-in-the-model
> attacks of data in transit.

All communication is secured using TLS and authentication handled by 
X.509 certificates.  This prevents man in the middle attacks. 
Certificates are signed by CAs I control.

>
> One is that the attacker is the person sending the data. E.g. I want to send
> a nasty payload (say, malware, or an offensive image). Another is that the
> attacker is the recipient of the file, who wants to read the sender's data.

The only person who can read a file is the owner.   AES encryption is 
built into the client software.  The only way data can be uploaded 
unencrypted is if encryption is intentionally disabled.

>
> As far as I can tell, the OP's plan to defend the sender's privacy is to
> dump responsibility for encrypting the files in the sender's lap. As far as
> I'm concerned, perhaps as many as one user in 20000 will pre-encrypt their
> files. (Early adopters will be unrepresentative of the eventual user base
> of this system. If this takes off, the user base will likely end up
> dominated by people who think that "qwerty" is the epitome of unguessable
> passwords.)

Making assumptions again.  See above.  The client software encrypts by 
default.  You're also assuming there is no password strength checking.

>
> Users just don't use crypto unless their applications do it for them.

And it does.

>
> My opinion is that the application ought to do so, and not expect Aunt
> Tillie to learn how to correctly use encryption software before uploading
> her files.
>
> http://www.catb.org/jargon/html/A/Aunt-Tillie.html
>
> It is the OP's prerogative to disagree, of course, but to me, if the OP's
> app doesn't use strong crypto to encrypt users' data, that's tantamount to
> saying they don't care about their users' data privacy. Using a
> monoalphabetic substitution cipher to obfuscate the data is not strong
> crypto.

You've gone on a rampage about nothing.  My original description said 
the client was supposed to encrypt the data, but you want to assume the 
opposite for some unknown reason.

>
>
>> The goal isn't to prevent the attacker from working out
>> the key for a file that has already been obfuscated. Any real data
>> that might be exposed by a vulnerability in the server is presumed to
>> have already been strongly encrypted by the user.
>
> I think that's a ridiculously unrealistic presumption, unless your user-base
> is entirely taken from a very small subset of security savvy and
> pedantically careful users.

The difference is he's not assuming I'm a moron.  He's giving me the 
benefit of the doubt.  That plus I actually said, "data senders are 
supposed to encrypt data".

In a networked system, you can't make assumptions about what the other 
peers are doing.  You have to handle what comes across the wire.  You 
also have to consider that you may come under attack.  That's what this 
is about.

>
>
>> The goal is to prevent the attacker from guessing a key that hasn't
>> even been generated yet, which could be exploited to engineer the
>> obfuscated content into something malicious.
>
> They don't need to predict the key exactly. If they can predict that the key
> will be, lets say, one of these thousand values, then they can generate one
> thousand files and upload them. One of them will match the key, and there's
> your exploit. That's one attack.

Thousand Values ???  Isn't it 256!, which is just freaking huge!  import 
math; math.factorial(256)

>
> A second attack is to force the key. The attacker controls the machine the
> application is running on, they control /dev/urandom and can feed your app
> whatever not-so-random numbers they like, so potentially they can force the
> app to use the key of their choosing. Then they don't need 1000 files, they
> just need one.
>

If the attacker controlled the machine the app was on, why would it fool 
with /dev/urandom?  I think he'd just plant the files he wanted to plant 
and be done.  This is non-nonsensical anyway.

> That's two. Does anyone think that I've thought of all the possible attacks?
>
> (Well, hypothetical attacks. I acknowledge that I don't know the
> application, and cannot be sure that it *actually is* vulnerable to these
> attacks.)
>
> The problem here is that a monoalphabetic substitution cipher is not
> resistant to preimage attacks. Your only defence is that the key is
> unknown. If the attacker can force the key, or predict the key, or guess a
> small range of keys, they can exploit your weak cipher.
>
> (Technically, "preimage attack" is usually used to refer to attacks on hash
> functions. I'm not sure if the same name is used for attacks on ciphers.)
>
> https://en.wikipedia.org/wiki/Preimage_attack
>
> With a strong crypto cipher, there are no known preimage attacks. Even if
> the attacker knows exactly what key you are using, they cannot predict what
> preimage they need to supply in order to generate the malicious payload
> they want after encryption. (As far as I know.)
>
> That is the critical issue right there. The sort of simple monoalphabetic
> substitution cipher using bytes.translate that the OP is using is
> vulnerable to preimage attacks. Strong crypto is not.

It isn't vulnerable to preimage attacks unless you can guess the key out 
of 256! possibilities.  The key doesn't even exist until after the data 
is sent.  Give me one plausible scenario where an attacker can cause 
malware to hit the disk after bytearray.translate with a 256 byte 
translation table and I'll be thankful to you.  As it stands now, you're 
either ignoring information I've already given, assuming I've made 
moronic design decisions the pouncing on them, or completely 
misunderstanding the issue at hand.

>
>
>> There are no
>> frequency-based attacks possible here, because you can't do frequency
>> analysis on the result of a key that hasn't even been generated yet.
>
> Frequency-based attacks apply to a different threat. I'm referring to at
> least two different attacks here, with different attackers and different
> victims. Don't mix them up.
>
>
>> Assuming that you have no attack on the key generation itself, the
>
> Not a safe assumption!

For this case it is a safe assumption.  For the same reason you're 
assuming the PSU isn't defective.  An attack on /dev/urandom for 
instance would also compromise TLS, and every sort of secure key 
generation, not just the key for byte translation.  That is a separate 
problem altogether with a separate solution.

>
>
>> best you can do is send a file deobfuscated with a random key and hope
>> that the recipient randomly chooses the same key; the odds of that
>> happening are 1 in 256!.
>
> It's easy to come up with attacks which are no better than brute force. It's
> the attacks which are better than brute force that you have to watch out
> for.
>

And that's why we're having this discussion.  Do you know of an attack 
in which you can control the output (say at least 100 consecutive bytes) 
for data which goes through a 256 byte translation table, chosen 
randomly from 256! permutations after the data is sent.  If you do, I'm 
all ears!  But at this point you're just setting up straw men and 
knocking them down.