[Edu-sig] Re: Edu-sig digest, Vol 1 #172 - 5 msgs

Seth David Schoen schoen@loyalty.org
Tue, 21 Nov 2000 11:34:28 -0800


Dorothea Salo writes:

> > def permute():
> >     """
> >     Randomly permute the uppercase alphabet
> >     by choosing its letters pseudo-randomly
> >     """
> >     alphalist = list(string.uppercase)
> >     newlist = []
> >     for i in range(len(alphalist)):
> >         randchar = random.choice(alphalist)
> >         alphalist.remove(randchar)
> >         newlist.append(randchar)
> >     return newlist
> >
> > def mkdict():
> >     """
> >     Pair uppercase alphabet with randomly permuted
> >     version of same
> >     """
> >     tuples = zip(string.uppercase,permute())
> >     codedict = {}
> >     for pair in tuples:
> >         codedict[pair[0]]=pair[1]
> >     return codedict
> 
>     The above algorithms, if I'm reading them correctly, allow a character
> to be substituted by itself. I assume this is desirable behavior (since it
> increases the number of permutations), but perhaps someone could comment on
> how often self-substitution could be expected to happen, and how likely it
> might be that self-substitution would cause the "encrypted" result to be
> easier to human-decode. (Self-substitution of "e" seems more likely to ease
> decryption than self-substitution of "z," but I'm just guessing.)

Self-substitution happens at all with probability 1-(1/e).

It might help human attackers in some circumstances, if they get
extraordinarily lucky.  But in general, there's no reason to assume
that self-substitution leaks more information about the message.

If you see that the most common letter in a cyphertext is "M", do
you assume that "M" is "E"?

If you see that the most common letter in a cyphertext is "E", do
you assume that "E" is "E"?

If your answer to the two questions is different, then you're implying
that you think the cyphertext alphabet is not completely arbitrary.
This might be a good assumption in dealing with substitution cyphers
created by humans by hand (in the sense that people are terrible at
generating random numbers), but it's probably not a good assumption in
dealing with an automatically generated cypher.

Remember that by forbidding self-substitution, you eliminate almost
2/3 of the possible permutations, so you more than _double_ the amount
of information available about the decryption process.  (OK, that's
not quite fair, because you don't usually have to find the entire
alphabet in order to recover the plaintext.)  Why should you give any
extra clues?

I can rephrase this another way.

Suppose I have a cyphertext "QTQRQ ZCMYO".

So originally this is just saying: there is a phrase of two words of
five letters each written with the Latin alphabet.  The first, third,
and fifth letters of the first word are identical to each other, and
different from all other letters in the phrase.  The second letter
of the first word is unique (not repeated anywhere else in the
phrase).  So is the fourth letter of that word.  So is each letter
of the second word.

Do you see that this is _all_ the information given by the cyphertext?

Now specifying a non-self-substitution rule adds the following
additional information, which was not conveyed in any way by the
cyphertext: The first letter of the first word is not "Q".  The second
letter of that word is not "T".  The fourth letter of that word is not
"R".  The first letter of the second word is not "Z".  The second
letter of that word is not "C".  The third letter of that word is
not "M".  The fourth letter of that word is not "Y".  The fifth
letter of that word is not "O".

There are potentially very significant hints!  But there was
absolutely no way to infer them directly from the original cyphertext,
without reference to knowledge of any languages that can be written
in the Latin alphabet.

Another way to say this is that if you somehow have the vast file
/usr/share/plaintexts/parsimonious-decryptions, which contains all
possible parsimonious decryptions of every cyphertext :-), and you
are trying to write a program to search this file and print out
all possible parsimonious decryptions for a given cyphertext, the
hints you get from the non-self-substitution rule will potentially
make your program faster and make it print out fewer possible
decryptions.

-- 
Seth David Schoen <schoen@loyalty.org>  | And do not say, I will study when I
Temp.  http://www.loyalty.org/~schoen/  | have leisure; for perhaps you will
down:  http://www.loyalty.org/   (CAF)  | not have leisure.  -- Pirke Avot 2:5