Pure Python Data Mangling or Encrypting

Fri Jun 26 23:38:46 EDT 2015

On Sat, 27 Jun 2015 06:09 am, Randall Smith wrote:

> On 06/26/2015 12:06 PM, Steven D'Aprano wrote:
>> On Fri, 26 Jun 2015 11:01 am, Ian Kelly wrote:
>>
>>> You're making the same mistake that Steven did in misunderstanding the
>>> threat model.
>>
>> I don't think I'm misunderstanding the threat, I think I'm pointing out a
>> threat which the OP is hoping to just ignore.
> 
> I'm not hoping to ignore anything.  I didn't explain the entire system,
> as it was not necessary to find a solution to the problem at hand.  But
> since you want to make negative assumptions about what I didn't tell
> you, I'll gladly address your accusations of negligence.

"Negligence" is *your* word, not mine. I've never said that. And I'm not
*assuming* anything, everything I've stated has been based on the evidence
of what you have written. I've even gone so far as to EXPLICITLY say that I
cannot know for a fact that your application is vulnerable to these
threats, since I'm only going from a description rather than the app
itself. But your responses don't suggest that you have these threats under
control, on the contrary, they indicate that you are *far* underestimating
the seriousness of them and overestimating the difficulty of running a
secure application on a machine you cannot trust.

If your application has any saving grace, it is that there are easier ways
to get malware onto somebody else's computer. There are a hundred million
unsecured Windows boxen out there, if I were malicious I would just hire a
bot net rather than spend the time trying to hack your system. But maybe
somebody else will do it just for the lulz, or to prove it can be done.
Some black hats like a challenge, and yours appears to fall nicely into
that middle ground of hard enough to be interesting but not hard enough to
be really difficult.

>> In an earlier post, I suggested that the threat model should involve at
>> least *three* different attacks, apart from the usual man-in-the-model
>> attacks of data in transit.
> 
> All communication is secured using TLS and authentication handled by
> X.509 certificates.  This prevents man in the middle attacks.
> Certificates are signed by CAs I control.

You control the CAs? Presumably that means they're self-signed (unless you
mean you get to choose the CA). I don't know if that makes a difference or
not.

>> One is that the attacker is the person sending the data. E.g. I want to
>> send a nasty payload (say, malware, or an offensive image). Another is
>> that the attacker is the recipient of the file, who wants to read the
>> sender's data.
> 
> The only person who can read a file is the owner.   AES encryption is
> built into the client software.  The only way data can be uploaded
> unencrypted is if encryption is intentionally disabled.

With respect Randall, you contradict yourself. Is there any wonder that some
of us (well, me at least) is suspicious and confused, when your story
changes as often as the weather?

Sometimes you say that the client software uses AES encryption. Sometimes
you say that you don't want to use AES encryption because you want the
client to be pure Python, and a pure-Python implementation would be too
slow. Your very first post says:

    My original idea was for the recipient to encrypt using AES.  But
    I want to keep this software pure Python "batteries included" and
    not require installation of other platform-dependent software.  
    Pure Python AES and even DES are just way too slow.

Sometimes you say the user is supposed to encrypt the data themselves:

    While the data senders are supposed to encrypt data, that's not
    guaranteed

Now you say that the application encrypts the data, except that the user can
turn that option off.

Just make the AES encryption mandatory, not optional. Then the user cannot
upload unencrypted malicious data, and the receiver cannot read the data.
That's two problems solved.

Making AES or similarly strong encryption mandatory protects both the sender
of data and the receiver of data. I cannot imagine why you are considering
making it optional, since that only adds more work for you and reduces the
security of your users.

Oh, and DES is not good enough.

>> As far as I can tell, the OP's plan to defend the sender's privacy is to
>> dump responsibility for encrypting the files in the sender's lap. As far
>> as I'm concerned, perhaps as many as one user in 20000 will pre-encrypt
>> their files. (Early adopters will be unrepresentative of the eventual
>> user base of this system. If this takes off, the user base will likely
>> end up dominated by people who think that "qwerty" is the epitome of
>> unguessable passwords.)
> 
> Making assumptions again.  See above.  The client software encrypts by
> default.  You're also assuming there is no password strength checking.

My comment about "qwerty" as a password was a comment on the majority of
people on the internet, not an assumption about your application.

>> Users just don't use crypto unless their applications do it for them.
> 
> And it does.

Great to hear it! Just make sure your application always encrypts the
uploaded files, and you protect both the sender of the files and the
receiver.

At least from these threats. There are others.

Just to indicate how hard this is, here is a ten year old timing attack
against AES:

http://cr.yp.to/antiforgery/cachetiming-20050414.pdf

Is your application vulnerable to timing attacks? No idea. Talk to a
security expert.

>> My opinion is that the application ought to do so, and not expect Aunt
>> Tillie to learn how to correctly use encryption software before uploading
>> her files.
>>
>> http://www.catb.org/jargon/html/A/Aunt-Tillie.html
>>
>> It is the OP's prerogative to disagree, of course, but to me, if the OP's
>> app doesn't use strong crypto to encrypt users' data, that's tantamount
>> to saying they don't care about their users' data privacy. Using a
>> monoalphabetic substitution cipher to obfuscate the data is not strong
>> crypto.
> 
> You've gone on a rampage about nothing.  My original description said
> the client was supposed to encrypt the data, but you want to assume the
> opposite for some unknown reason.

Hardly a "rampage", and not "some unknown reason". End-users don't do
crypto. About one person in a hundred thousand digitally signs their
emails. Even people who know better don't use crypto. I don't use crypto.
If you are relying on people to encrypt their files, as you have suggested
in the past, *it won't happen*.

If the app does encrypt the data with AES before sending, then you don't
gain any benefit by obfuscating an encrypted file with a classical
monoalphabetic substitution cipher.

>>> The goal isn't to prevent the attacker from working out
>>> the key for a file that has already been obfuscated. Any real data
>>> that might be exposed by a vulnerability in the server is presumed to
>>> have already been strongly encrypted by the user.
>>
>> I think that's a ridiculously unrealistic presumption, unless your
>> user-base is entirely taken from a very small subset of security savvy
>> and pedantically careful users.
> 
> The difference is he's not assuming I'm a moron.  He's giving me the
> benefit of the doubt.  That plus I actually said, "data senders are
> supposed to encrypt data".

I'm not assuming you are a moron, I'm making a judgement based on your posts
that you might be out of your depth when it comes to crypto. Maybe I'm
wrong, and you know what you're doing, you just don't know how to
communicate that fact. This is an example: the sender is supposed to
encrypt the data, or the application encrypts it for them? Sometimes you
say the sender encrypts the data, sometimes you say the application
encrypts the data. It makes a big difference whether or not Aunt Tillie has
to run a separate encryption application before uploading her files.

If the application does it (a very good thing!) then there is the mystery of
why you would allow people to turn that option off. That can only make more
work for you and reduce the security of your application for both senders
and receivers.

> In a networked system, you can't make assumptions about what the other
> peers are doing.  You have to handle what comes across the wire.  You
> also have to consider that you may come under attack.  That's what this
> is about.

Right.

And you're in a difficult situation because your users (sender and receiver)
can't trust each other. This is harder than (say) Bittorrent.

With BT, the receiver can't trust the sender, but there's no threat to the
sender. (Well, legal threats, but that's a social issue, not a technical
one.) If I (let's assume legally) upload a file to a BT network, I don't
worry about downloaders reading the file. I want them to read it. That's
not the case here.

>>> The goal is to prevent the attacker from guessing a key that hasn't
>>> even been generated yet, which could be exploited to engineer the
>>> obfuscated content into something malicious.
>>
>> They don't need to predict the key exactly. If they can predict that the
>> key will be, lets say, one of these thousand values, then they can
>> generate one thousand files and upload them. One of them will match the
>> key, and there's your exploit. That's one attack.
> 
> Thousand Values ???  Isn't it 256!, which is just freaking huge!  import
> math; math.factorial(256)

No. There are 256! possible keys *in total*, but that doesn't necessarily
mean that the keys are unpredictable.

Suppose that you hire an intern to write the "choose key" function, and not
knowing any better, he simply iterates through the keys in numeric order,
one after the other. So the first upload will use key 0, the second key 1,
the third key 2, and so on, until key 256! - 1, then start again. In that
case, predicting the next key is *trivial*. If I can work out what key you
send now (I just upload a file containing "\x00\x01\x02...\xFF" to myself
and see what I get), then I know what key the app will use next.

Now obviously you're not going to let the new intern write such a critical
part of the application, not without a senior developer doing a code
review. He reviews the code, and mixes the keys up in some fashion, the
more unpredictable the better. So he uses a random number generator to
choose the key. 

Maybe you use Python's standard library and the Mersenne Twister. The period
of that is huge, possibly bigger than 256! (or not, I forget, and I'm too
lazy to look it up). So you think that's safe. But it's not: Mersenne
Twister is not a cryptographically secure pseudorandom number generator. If
I can get some small number of values from the Twister (by memory,
something of the order of 100 such values) then I can predict the rest for
ever.

Even if I can't do that, I might be able to guess the seed: I know what time
the application started up, to within a few milliseconds, and I know (or
can guess) how many random numbers you have used, so I can predict that it
will be so far into the period, give or take a few hundred or a thousand
values. Instead of having to guess the key out of 256! possible values, I
now only have to guess the key out of 1000 possible values. I can keep
sending files to myself until I work out which key it is, and from that
point on I can now predict every key with 100% certainty.

Okay, so you're smarter than that. You know that Python's Mersenne Twister
is not a CSPRNG, and you use /dev/urandom or its Windows equivalent.
Excellent.

Except... you're getting your random numbers from a system *I* control. I
can just feed you whatever "random" numbers I want.

I don't know how to solve that one. Writing secure code where you cannot
trust the machine you are running on is *immeasurably tougher* than writing
secure code on a trusted machine.

>> A second attack is to force the key. The attacker controls the machine
>> the application is running on, they control /dev/urandom and can feed
>> your app whatever not-so-random numbers they like, so potentially they
>> can force the app to use the key of their choosing. Then they don't need
>> 1000 files, they just need one.
>>
> 
> If the attacker controlled the machine the app was on, why would it fool
> with /dev/urandom?  I think he'd just plant the files he wanted to plant
> and be done.  This is non-nonsensical anyway.

No, you don't understand the nature of the attack. In this scenario, the
sender is the attacker. I want to upload malicious files to the receiver.
You are trying to stop me, that's the whole point of "mangling or
encrypting" the files. (Your words.) So I, the sender, prepare a file such
that when you mangle it, the resulting mangled content is the malicious
content I want.

If you use a substitution cipher, I can do this if I can guess or force the
key. If you use strong crypto, I can't.

However, I can hack the application. The client sits on my computer, it's
pure Python, even if it isn't I can still hack the application, I don't
need access to the source code.

>> That's two. Does anyone think that I've thought of all the possible
>> attacks?
>>
>> (Well, hypothetical attacks. I acknowledge that I don't know the
>> application, and cannot be sure that it *actually is* vulnerable to these
>> attacks.)
>>
>> The problem here is that a monoalphabetic substitution cipher is not
>> resistant to preimage attacks. Your only defence is that the key is
>> unknown. If the attacker can force the key, or predict the key, or guess
>> a small range of keys, they can exploit your weak cipher.
>>
>> (Technically, "preimage attack" is usually used to refer to attacks on
>> hash functions. I'm not sure if the same name is used for attacks on
>> ciphers.)
>>
>> https://en.wikipedia.org/wiki/Preimage_attack
>>
>> With a strong crypto cipher, there are no known preimage attacks. Even if
>> the attacker knows exactly what key you are using, they cannot predict
>> what preimage they need to supply in order to generate the malicious
>> payload they want after encryption. (As far as I know.)
>>
>> That is the critical issue right there. The sort of simple monoalphabetic
>> substitution cipher using bytes.translate that the OP is using is
>> vulnerable to preimage attacks. Strong crypto is not.
> 
> It isn't vulnerable to preimage attacks unless you can guess the key out
> of 256! possibilities.

Yes. Do you think that's hard for an attacker who has access to your
application, possibly including the source code, and controls all the
sources of entropy on the system your application is running on?

I don't have to *randomly* guess. I control what time your application
starts, I control what randomness you get from /dev/urandom, I control how
many keys you go through, I might even be able to read the source code of
the application (not that I need to, that just makes it easier).

> The key doesn't even exist until after the data 
> is sent.  Give me one plausible scenario where an attacker can cause
> malware to hit the disk after bytearray.translate with a 256 byte
> translation table and I'll be thankful to you.  As it stands now, you're
> either ignoring information I've already given, assuming I've made
> moronic design decisions the pouncing on them, or completely
> misunderstanding the issue at hand.

If you still don't understand the security threats after I've explained them
repeatedly, then your app is doomed to join the multitude of insecure and
unsafe applications on the internet.

Actually, the more I think about this, the more I come to think that the
only way this can be secure is for both the sending client application and
the receiving client appl to *both* encrypt the data. The sender can't
trust the receiver not to read the files, so the sender has to encrypt; the
receiver can't trust the sender not to send malicious files, so the
receiver has to encrypt too.

>>> There are no
>>> frequency-based attacks possible here, because you can't do frequency
>>> analysis on the result of a key that hasn't even been generated yet.
>>
>> Frequency-based attacks apply to a different threat. I'm referring to at
>> least two different attacks here, with different attackers and different
>> victims. Don't mix them up.
>>
>>
>>> Assuming that you have no attack on the key generation itself, the
>>
>> Not a safe assumption!
> 
> For this case it is a safe assumption.  For the same reason you're
> assuming the PSU isn't defective.

A detective power supply isn't a security threat. It doesn't enable me to
upload malicious files to an unsuspecting receiver.

> An attack on /dev/urandom for 
> instance would also compromise TLS, and every sort of secure key
> generation, not just the key for byte translation.  That is a separate
> problem altogether with a separate solution.

Er, yes? And how does that make it invalid?

>>> best you can do is send a file deobfuscated with a random key and hope
>>> that the recipient randomly chooses the same key; the odds of that
>>> happening are 1 in 256!.
>>
>> It's easy to come up with attacks which are no better than brute force.
>> It's the attacks which are better than brute force that you have to watch
>> out for.
>>
> 
> And that's why we're having this discussion.  Do you know of an attack
> in which you can control the output (say at least 100 consecutive bytes)
> for data which goes through a 256 byte translation table, chosen
> randomly from 256! permutations after the data is sent.  If you do, I'm
> all ears!  But at this point you're just setting up straw men and
> knocking them down.

Ah yes, the good ol' "any argument I do not have an answer for must be a
straw man" gambit.

Whatever you say dude. Obviously you've got this security thing down to a
fine art. After all, if you can't think of a way to break the system,
nobody else can either.

-- 
Steven