Pure Python Data Mangling or Encrypting

Steven D'Aprano steve at pearwood.info
Thu Jun 25 11:13:02 EDT 2015


On Thu, 25 Jun 2015 08:03 pm, Jon Ribbens wrote:

> On 2015-06-25, Steven D'Aprano <steve+comp.lang.python at pearwood.info>
> wrote:
>> On Thursday 25 June 2015 14:27, Devin Jeanpierre wrote:
>>> If it's encrypted malware, and you can't decrypt it, there's no threat.
>>
>> If the *only* threat is that the sender will send malware, you can
>> mitigate around that by dropping the file in an unencrypted container.
>> Anything good enough to prevent Windows from executing the code,
>> accidentally or deliberately, say, a tar file with a custom extension.
> 
> That won't stop virus scanners etc potentially making their own minds
> up about the file.

*shrug* Sure, but I was specifically referring to the risk of the malware
being executed, not being detected by a virus scanner.

Encrypting the file won't even necessarily stop the virus scanner from
finding false positives. It might even increase the chances. But it will
prevent the virus scanner from finding actual viruses. You may or may not
consider that a problem.


>> But encrypting the file is also a good solution, and it prevents the
>> storage machine spying on the file contents too. Provided the encryption
>> is strong.
> 
> How would the receiver encrypting the file after receiving it prevent
> the receiver from seeing what's in the file?

I didn't say it ought to be encrypted by the receiver. Obviously the
encryption needs to be done in a way that the recipient doesn't get access
to the key. The obvious way to do that is for the application to encrypt
the data before it sends it. Then the receiver just writes the encrypted
bytes directly to a file. That would have the benefit of protecting against
man-in-the-middle attacks as well, since the file is never transmitted in
the clear.

 
>>> The original post said that the sender will usually send files they
>>> encrypted, unless they are malicious. So if the sender wants them to
>>> be encrypted, they already are.
>>
>> The OP *hopes* that the sender will encrypt the files. I think that's a
>> vanishingly faint hope, unless the application itself encrypts the file.
> 
> Yes, the application itself encrypts the file. Haven't you been
> reading what he's saying?

I have been reading what the OP has been saying. I'm not sure if you have
been. The OP doesn't want to encrypt the file, because he wants the
application to be pure Python and encryption in pure Python is too slow. So
he wants to obfuscate it with some sort of substitution cipher or
equivalent, which may be easily crackable by anyone who really wants to.

I've been arguing that the application *should* encrypt the file, and not
mess about giving the illusion of security.


>> The sender has a copy of the application? Then they can see the type of
>> obfuscation used. If they know the key, or can guess it, they can take
>> their malware, *decrypt* it, and send that, so that *encrypting* that
>> file puts the malicious code on the disk.
> 
> Not if they don't know the key they can't.

"If they know the key, or can guess it, ..."
"Not if they don't know the key they can't."

Really? Glad you're around to point that out to me.

But seriously, they have the application. If the application is using a
symmetric substitution cipher, it needs the key (because there is only
one), so the receiver will have the cipher.

With the sort of substitution cipher the OP is experimenting with, forcing a
particular result is trivially easy. The sender has access to the
application, knows the cipher, knows the key, and can easily generate a
file which will generate whatever content the sender wants after being
obfuscated.

Modern asymmetric ciphers like AES are quite resistant to that sort of
attack. There is, so far as I know, no way to generate a file which results
in a specific content after encryption.


>> E.g. suppose I want to send you an insult, but I know your program
>> automatically ROT-13s the strings I send you. Then I send you:
>>
>> 'lbhe sngure fzryyf bs ryqreoreevrf'
>>
>> and your program ROT-13s it to:
>>
>> 'your father smells of elderberries'
>>
>> I know that the OP doesn't propose using ROT-13, but a classical
>> substitution cipher isn't that much stronger.
> 
> Replace "ROT-13" with "ROT-n" where 'n' is a secret known only to the
> receiver, and suddenly it's not such a bad method of obfuscation.

There are only 256 possible values for n, one of which doesn't transform the
data at all (ROT-0). If you're thinking of attacking this by pencil and
paper, 255 transformations sounds like a lot. For a computer, that's barely
harder than a single transformation.


> Improve it to the random-translation-map method he's actually using
> and you've got really quite a reasonable system.

No, truly you haven't. The OP is experimenting with bytearray.translate,
which likely makes it a monoalphabetic substitution cipher, and the
techniques for cracking those go back to the 9th century AD. That's over a
thousand years of experience in cracking these things.

The situation is a bit harder than the sort of traditional ciphers, instead
of using an alphabet of 26 letters we have one of 256 bytes. But that's
only an order of magnitude bigger, and the cipher is still vulnerable to
frequency analysis and other attacks.

The only positive to this scheme is that the "encryption" is so weak (it's
been effectively obsolete since World War 2, if not before it) that you
might find it hard to find ready-made cracking tools for it unless you work
for the NSA, CIA or similar. You're relying on security by obscurity:
nobody uses this sort of thing any more, because it's so insecure, and that
obscurity does give you a *tiny* bit of security against a casual,
unmotivated attacker.

But once this system starts getting popular, that obscurity will not last.
It won't be difficult to build fast cracking programs that will break the
so-called "encryption", if it is based on a classical symmetric
monoalphabetic substitution cipher.

Here's an online tool which can be used for cracking "encrypted" English
text:

http://www.simonsingh.net/The_Black_Chamber/substitutioncrackingtool.html

You obviously wouldn't use that specific site on arbitrary files, but it
demonstrates that these classical ciphers are *not* secure.


>>> I am usually very oppositional when it comes to rolling your own
>>> crypto, but am I alone here in thinking the OP very clearly laid out
>>> their case?
>>
>> I don't think any of us *really* understand his use-case or the potential
>> threats, but to my way of thinking, you can never have too strong a
>> cipher or underestimate the risk of users taking short-cuts.
> 
> The use case is pretty obvious (a peer-to-peer dropbox type thing) but
> it does appear to be being misunderstood. This isn't actually a crypto
> problem at all and "users taking short-cuts" isn't an issue.

Yes it is. If users don't properly pre-encrypt their files before sending it
out to the cloud, AND THEY WON'T, receivers WILL be able to read those
files, half-arsed attempts to "encrypt" them or not.

The solution to all(?) these security problems is for the application to
handle the encryption, using a modern crypto library. But the OP doesn't
want to do that because it's too slow when written as pure Python.



-- 
Steven




More information about the Python-list mailing list