my cryptogram program

Mon May 15 17:31:45 EDT 2006

John Salerno <johnjsal at NOSPAMgmail.com> writes:
> def convert_quote(quote):
>      coded_quote = make_code(quote)
>      author = coded_quote.split('|')[1]
>      quote = coded_quote.split('|')[0]
>      return quote, author

I think it's a little bit ugly (plus inefficient) to split the quote twice.
You can use:

  def convert_quote(quote):
     coded_quote = make_code(quote)
     author, quote = coded_quote.split('|')
     return quote, author

> def make_code(original):
>      original_letters = make_set(original)
>      new_letters = list(string.ascii_uppercase)
>      while True:
>          random.shuffle(new_letters)
>          trans_letters = ''.join(new_letters)[:len(original_letters)]
>          if test_code(original_letters, trans_letters):
>              trans_table = string.maketrans(original_letters, trans_letters)
>              break
>      return original.translate(trans_table)

You're trying to make sure that no character maps to itself in the
cryptogram.  I'm not sure if that's one of the "rules".  If not, you
might like to know that it's a cryptographic weakness, not that you're
attempting strong cryptography ;-).  But the WW2 Enigma machine had
that characteristic as part of its design, and that was used to break it.

It also looks like upper and lower case letters in the input are
treated as separate.  For example, "George" become "abcdef" while
"george" becomes "abcde".  Did you want that?  I don't think it can be
right, because "trans_letters" has at most 26 characters, but there
can be 52 separate original letters (26 upper and 26 lower).

As for the efficiency of the above algorithm, well, look at what
happens after you shuffle the alphabet: the probability that any given
character maps to something other than itself is 25/26.  That means
the probability that N letters all map to something other than
themselves is (25/26)**N.  If N=26, this is about 0.36, so on average
you'll shuffle about three times, which is not too bad, if you're just
doing something casual.  Note that this is close to 1/e.  I'll let you
figure out the reason.  Of course there is a chance you'll have to
shuffle 4 times, 10 times, 1000 times, etc.  You might like to
calculate those probabilities and decide whether it's worth thinking
up a more efficient algorithm, that's possibly more complex and
therefore more likely to have bugs, to burn additional development
time, etc.  How bad is it if occasional instances take a long time to
generate?  This is the kind of tradeoff you always have to spend time
pondering if you're doing a large scale application.

> def make_set(original):
>      original_set = set(original)
>      punc_space = string.punctuation + string.whitespace
>      for char in punc_space:
>          if char in original_set:
>              original_set.remove(char)
>      return ''.join(original_set)

I think I'd write something like:

  def make_set(original):
      return set(strings.ascii_uppercase) & set(original.upper())

That is, convert the original string to uppercase, make a set from it,
and then intersect that set with the set of uppercase letters.  This
seems more direct.