Pure Python Data Mangling or Encrypting

Thu Jun 25 14:58:11 EDT 2015

Thanks Jon.  I couldn't have answered those questions better myself, and 
I wrote the software in question.

I didn't intend to describe the entire system, but rather just enough of 
it to present the issue at hand.  You seem to understand it quite well.

I'm now using a randomly generated 256 byte translation table, which 
performs very well on the lowly Raspberry PI ARM chip.  The Raspberry PI 
is to be my recommended storage node platform.

For those that care, the storage system is something like Amazon S3, 
except storage is distributed peer to peer.  Clients compress, encrypt, 
and chunk data, then send it to storage nodes. Storage nodes propagate 
the data.  Encryption and Authentication are handled through TLS.  Files 
use AES encryption for storage.  Storage Nodes are monitored for 
availability, integrity, and performance.  Data transfers are 
coordinated by a centralized service which tracks storage and transfers. 
  Redundancy is configurable by chunk. Storage nodes are compensated for 
storage x time.  Uploads and downloads can utilize several storage nodes 
simultaneously to increase throughput.

-Randall

On 06/25/2015 10:26 AM, Jon Ribbens wrote:
> On 2015-06-25, Steven D'Aprano <steve at pearwood.info> wrote:
>> On Thu, 25 Jun 2015 08:03 pm, Jon Ribbens wrote:
>>> That won't stop virus scanners etc potentially making their own minds
>>> up about the file.
>>
>> *shrug* Sure, but I was specifically referring to the risk of the malware
>> being executed, not being detected by a virus scanner.
>>
>> Encrypting the file won't even necessarily stop the virus scanner from
>> finding false positives. It might even increase the chances.
>
> That seems spectacularly unlikely.
>
>> But it will prevent the virus scanner from finding actual viruses.
>> You may or may not consider that a problem.
>
> The OP would consider it a benefit.
>
>> I didn't say it ought to be encrypted by the receiver. Obviously the
>> encryption needs to be done in a way that the recipient doesn't get access
>> to the key.
>
> No, you're still misunderstanding. The encryption needs to be done in
> a way that the *sender* doesn't get access to the key. The recipient
> has access to it by definition because the recipient chooses it.
>
>> The obvious way to do that is for the application to encrypt the
>> data before it sends it.
>
> Yes, he already said the application does that. The problem is,
> what if the sender is not the genuine application but is instead
> a malicious attacker?
>
>> Then the receiver just writes the encrypted bytes directly to a file.
>
> That's precisely what he's trying to avoid.
>
>> That would have the benefit of protecting against man-in-the-middle
>> attacks as well, since the file is never transmitted in the clear.
>
> With what he's talking about, the file after encryption is never
> transmitted *at all*.
>
>> I've been arguing that the application *should* encrypt the file, and not
>> mess about giving the illusion of security.
>
> You haven't understood the threat model.
>
>> But seriously, they have the application. If the application is using a
>> symmetric substitution cipher, it needs the key (because there is only
>> one), so the receiver will have the cipher.
>
> There is not only one key. The recipient would invent a new key for
> each file after the file is received.
>
>> With the sort of substitution cipher the OP is experimenting with, forcing a
>> particular result is trivially easy. The sender has access to the
>> application, knows the cipher, knows the key, and can easily generate a
>> file which will generate whatever content the sender wants after being
>> obfuscated.
>
> No, because the sender does not know the key.
>
>>> Replace "ROT-13" with "ROT-n" where 'n' is a secret known only to the
>>> receiver, and suddenly it's not such a bad method of obfuscation.
>>
>> There are only 256 possible values for n, one of which doesn't transform the
>> data at all (ROT-0). If you're thinking of attacking this by pencil and
>> paper, 255 transformations sounds like a lot. For a computer, that's barely
>> harder than a single transformation.
>
> Well, it means you need to send 256 times as much data, which is a
> start. If you're instead using a 256-byte translation table then
> an attack becomes utterly impractical.
>
>>> Improve it to the random-translation-map method he's actually using
>>> and you've got really quite a reasonable system.
>>
>> No, truly you haven't. The OP is experimenting with bytearray.translate,
>> which likely makes it a monoalphabetic substitution cipher, and the
>> techniques for cracking those go back to the 9th century AD.
>
> Only if you have the ciphertext, which the attacker in this scenario
> does not. The attacker gets to set the plaintext, knows the algorithm,
> does not know the key (unless the method of choosing the key has a
> flaw), and wants to set the ciphertext to some specific string.
> Frequency analysis doesn't even begin to apply to this scenario.
>
>> You're relying on security by obscurity
>
> No, he really isn't.
>
>>> The use case is pretty obvious (a peer-to-peer dropbox type thing) but
>>> it does appear to be being misunderstood. This isn't actually a crypto
>>> problem at all and "users taking short-cuts" isn't an issue.
>>
>> Yes it is. If users don't properly pre-encrypt their files before sending it
>> out to the cloud, AND THEY WON'T,
>
> Yes they will. He said his application encrypts the files for them,
> presumably he is indeed using "proper crypto" for that.
>
>> receivers WILL be able to read those files,
>
> That's a problem for the sender not the receiver.
>