Encryption with Python?

Mon May 9 18:12:20 EDT 2005

Paul,

      I thank you too for your response. Let me just tell you what goes
through my mind.

----- Original Message -----
From: "Paul Rubin" <"http://phr.cx"@NOSPAM.invalid>
Newsgroups: comp.lang.python
To: <python-list at python.org>
Sent: Monday, May 09, 2005 9:45 PM
Subject: Re: Encryption with Python?

> "Anthra Norell" <anthra.norell at tiscalinet.ch> writes:
> > The non-randomness of the difference is evidence of having guessed the
key,
> > right? Why then do I need two samples? If I hack away at a single sample
I
> > get a probably more conspicuous non-randomness twice as fast.
>
> No.  Let's say you encrypt two ascii strings with the same key.  The
> high bit of each byte in the plaintext is zero.  Therefore if you xor
> the two ciphertexts together, the high bit of each byte of the result
> xor'd ciphertexts will be zero.  So just check for that and you've
> immediately spotted the non-randomness.

I don't follow. There is no bitwise correlation between a plain-text
character and its encoded equivalent. What's more, there is no detectable
correlation at all. Take a highly ordered plain text, such as a sting of
identical characters, e.g. 'AAAAAAAAAAAAAAA' and sequentially add a
deterministically generated random series (module 256 to keep them within
range), what you get is, quite obviously, a series of numbers that differs
from the added random series in only one respect: all values are shifted by
ord ('A'). The intervals from one byte to the next remain unchanged,
allowance made for the module 256 wrap-around. The intervals remaining
unchanged, the distribution and hence the randomness of the encryption
remains unchanged. Quite obviously, each one of the identical plain-text
characters very likely will be encrypted differently. Repeats would occur,
but they would occur randomly once every 256 times on an average.

> > I don't doubt that, given a series (long enough), the postion can be
> > derived. I doubt, though, that a series is knowable, if another,
unknown,
> > series has been added to it.
>
> You have to assume that the attacker has access to known
> plaintext-ciphertext pairs.  For example, you might not tell someone
> the password you use now, but what about some old password that you
> don't use any more?  If the attacker knows your old password (maybe
> because your sysop set it to some default value and had you change it
> on your first login), and has the encrypted version, there's a known
> plaintext.

Password management is certainly a problem, but of course is totally
unrelated to the quality of an encryption method.

> > I thought the problem was concealing passwords from ones kids or
> > collaborators.
>
> Encryption is supposed to conceal data from knowledgable attackers
> willing to burn significant resources to get at the data.  That mightof
> or might not describe your friends and collaborators.

I agree. Depending on a situation, a solution might or might not be
adequate.

> > I believe that a randomly distributed series utterly obliterates any
> > non-randomness or regularity of a second series, if the two are added.
>
> This is a meaningless statement since you don't give any definition of
> "randomly distributed series".

No, in fact I don't. I am quite confident that the library module 'random'
produces random distributions. As to the distribution of a non-random series
added to a random series, my intuition tells me that it remains random.

> > I don't know how to produce a formal proof. It is just a hunch. It's
> > actually more than a hunch. It is a conviction. Not a certainty; a
> > conviction. I'd be delighted to be proved wrong (or right).

I don't think it would be difficult for a mathematician to prove or disprove
the hypothesis. I did come up with an informal proof. It is a function I
will add at the bottom of this message. You can copy and run it, if you have
the Image package installed.

>
> If the keystream really can't be distinguished from random, then correct,
> though there's still issues with key management (you mustn't use the same
> key twice) and authentication.

The key is the seed of the random generator.

> Generating keystreams that are indistinguishable from random is an
> extremely tricky subject, there are books and papers written about it,
etc.

I agree. I wouldn't know how to design a random generator. Fortunately I
don't need to. I can use ready made ones.

> > The fact may be significant that we module overflow back into the
> > range.  So, if the distribution of my code is indeed random,

> Your code used the Python built-in PRNG algorithm which is designed to
> make output with similar statistical properties as actual random numbers,
> for the purpose of running stuff like simulations.  It makes no attempt
> at all to be secure against attackers trying to figure out whether the
> output is really random.

Try out the following function. You need the Image package.

Regards

Frederic

###############################################

def informal_proof_of_randomness (text_file_name):    # File must be longer
than 60K

   X = 0
   Y = 1

   import random

   NUMBER_OF_DOTS = 30000

   # Make three lists to collect coordinate pairs
   random_coordinates       = NUMBER_OF_DOTS * [None]
   plain_text_coordinates   = NUMBER_OF_DOTS * [None]
   encoded_text_coordinates = NUMBER_OF_DOTS * [None]

   # Fill one with random numbers
   for i in xrange (NUMBER_OF_DOTS):
      x = random.randint (0,255)
      y = random.randint (0,255)
      random_coordinates [i] = ((x,y))

   # Fill a second one with ascii codes (plain text)
   f = file (text_file_name, 'rb')
   for i in xrange (NUMBER_OF_DOTS):
      # No check if file runs out prematurely. Make sure it's at least 60K
      x = ord (f.read (1))
      y = ord (f.read (1))
      plain_text_coordinates [i] = ((x,y))
   f.close ()

   # Fill a third one with the sum of the two previous ones (proposed cipher
text)
   for i in xrange (NUMBER_OF_DOTS):
      encoded_text_coordinates [i] = \
         (random_coordinates [i][X] + plain_text_coordinates [i][X]) % 256,
\
         (random_coordinates [i][Y] + plain_text_coordinates [i][Y]) % 256

   import Image

   WHITE = 1
   BLACK = 0

   # Make three images to visualize the distribution
   random_image       = Image.new ('1', (256,256), WHITE)
   text_image         = Image.new ('1', (256,256), WHITE)
   encoded_text_image = Image.new ('1', (256,256), WHITE)

   for coordinate in random_coordinates:
      random_image.putpixel (coordinate, BLACK)

   for coordinate in plain_text_coordinates:
      text_image.putpixel (coordinate, BLACK)

   for coordinate in encoded_text_coordinates:
      encoded_text_image.putpixel (coordinate, BLACK)

   # Visualize the distributions
   random_image.show ()         # Looks like rain on a pavement
   text_image.show ()           # Does not at all
   encoded_text_image.show ()   # See for yourself

##############

> http://mail.python.org/mailman/listinfo/python-list