Pure Python Data Mangling or Encrypting

Thu Jun 25 21:01:19 EDT 2015

On Thu, Jun 25, 2015 at 6:33 PM, Chris Angelico <rosuav at gmail.com> wrote:
> On Fri, Jun 26, 2015 at 1:26 AM, Jon Ribbens
> <jon+usenet at unequivocal.co.uk> wrote:
>>> There are only 256 possible values for n, one of which doesn't transform the
>>> data at all (ROT-0). If you're thinking of attacking this by pencil and
>>> paper, 255 transformations sounds like a lot. For a computer, that's barely
>>> harder than a single transformation.
>>
>> Well, it means you need to send 256 times as much data, which is a
>> start. If you're instead using a 256-byte translation table then
>> an attack becomes utterly impractical.
>>
>
> Utterly impractical? Maybe, if you attempt a pure brute-force approach
> - there are 256! possible translation tables, which is roughly e500
> attempts [1], and at roughly four a microsecond [2] that'd still take
> a ridiculously long time. But there are two gigantic optimizations you
> could do. Firstly, there are frequency-based attacks, and byte value
> duplicates will tell you a lot - classic cryptographic work. And
> secondly, you can simply take the first few bytes of a file - let's
> say 16, although a lot of files can be recognized in less than that.
> Even if there are no duplicate bytes, that'd be a maximum of 16!
> translation tables that truly matter, or just 2e13. At the same speed,
> that makes about a million seconds of computing time required. Divide
> that across a bunch of separate computers (the job is embarrassingly
> parallel after all), and you could get that result pretty easily. Cut
> the prefix to just 8 bytes and you have a mere 40K encryption keys to
> try - so quick that you wouldn't even see it happen. Nope, a simple
> substitution cipher is still not secure. Even the famous Enigma
> machine was a lot more than just letter-for-letter substitution - a
> double letter in the cleartext wouldn't be represented by a double
> letter in the result - and once the machine's secrets were figured
> out, the day's key could be reassembled fairly readily.

You're making the same mistake that Steven did in misunderstanding the
threat model. The goal isn't to prevent the attacker from working out
the key for a file that has already been obfuscated. Any real data
that might be exposed by a vulnerability in the server is presumed to
have already been strongly encrypted by the user.

The goal is to prevent the attacker from guessing a key that hasn't
even been generated yet, which could be exploited to engineer the
obfuscated content into something malicious. There are no
frequency-based attacks possible here, because you can't do frequency
analysis on the result of a key that hasn't even been generated yet.
Assuming that you have no attack on the key generation itself, the
best you can do is send a file deobfuscated with a random key and hope
that the recipient randomly chooses the same key; the odds of that
happening are 1 in 256!.

That said, I do see a potential weakness here: if the attacker can
create a malicious payload using only a subset of the 256 possible
byte values, then the odds of getting a correct key are increased,
since multiple keys will work. For an extreme example, if the attacker
can manage to craft a malicious payload that uses only the two byte
values 32 and 47, then the probability of getting a key that will
obfuscate to that is increased to 1 in 256! / 254!, or 1 in 65280. If
they distribute 65280 copies of that payload to various recipients,
then they can expect that one recipient on average will get the
payload in its malicious form.