Python for Reverse Engineering

Andrew Dalke adalke at mindspring.com
Thu Nov 11 07:16:22 EST 2004


James S wrote:
> I got a few questions about the strings, so I figured I would share the 
> answers.


I decided to play around with it for a bit.


The first digit of any group of 5 has less diversity.  Here's a
histogram of the distinct number of characters for each position.

  0 0456BCDJKLQRSTYZ
  1 0123456789BDFGHJKLMNPQRSTVWXYZ
  2 023456789BCDFGHJKLMPQRSTVWXY
  3 012345679BCDFGHJKLMNPQRSTVWXYZ
  4 012345679BCDFGHJKNPQRSTXYZ
  5 1235679BCGKLNPQTVXYZ
  6 0123456789BCDFGHJKLMNPRSTVWXZ
  7 0123456789CDFGHJKLMNPQSTVWXYZ
  8 0123456789BCDFGHJKLMNPQRSTVWXY
  9 0123456789BCFGHJKLMNPQRSTVWXYZ
10 01235678CDFGHKLMNPQSTWYZ
11 0123456789BCDFGHJKLMNPQRSTVWXYZ
12 0123456789BCDGHJKLMNPQRSTVWXYZ
13 0123456789BCDFGHJKLMNPQRSTVWXYZ
14 0123456789BCDFGHJKLMNPQRSTVWXYZ
15 0345678BDFHKLMNQSTVWY
16 0123456789BCDFGHJKLMNPQRSTVWXY
17 0123456789BCDGHJKLMNPQRSTVWXZ
18 012345679BCDFGHJKLMNPQRSTVWXYZ
19 0123456789BCDFGHJKLMNPQRSTVXYZ
20 046789BGHJKLMNRTVWXYZ
21 013456789BCDFGHJKLMNPQRSTVWXYZ
22 123456789BCDFGHJKLMNPQRSTVWYZ
23 0123456789BCDFHJKLMNPQRSTVWXYZ
24 012345678BCDFGHJKLMNPQRSTVWXYZ


There are 31 characters in the encoding.  They are the
digits 0-9 and the consonants B-Z (including Y).  I suspect
the vowels were omitted so no English words would be
generated by accident.

This gives a maximum space of 31**25 or
   19232792489931358333837313998767870751
values.

Given

 > There are 28629150 valid keys in the space of 28629150^5

then that's
   19232789130978948891037625982187500000
or smaller than 31**25 by
          3358952409442799688016580370751

124 bits are needed to encode the full space because

 >>> 2**123 < 28629150**5 < 2**124
True
 >>>

However it's also the case that

 >>> 2**123 < 31**25 < 2**124
True
 >>>

This means at most a fraction of a bit (about 0.85)
is available for any checksum, since you say

> There is a checksum encoded in the string that provides a basic check. 
> There is a unquice number encoded in each string that represents each 
> possible key.

If all possible keys in the space can be encoded by
this scheme then there's no extra space for a checksum.


I therefore don't know what that number means for this problem.

Supposing there is a checksum, it's probably the first digit
in the group.  The loss of randomness is a clue -- the checksums
I know don't have that effect.


I can't tell if transpositions change the checksum.  The only
anagram group is

#59 5SFHJ-GHP1B-FCSCQ-KBCQB-86J50
            ^^^^
#37 C9W27-N2SJ3-8HB1P-4VD0F-R9QXL
                  ^^^^

and there may be a seed to the checksum which changes for
each group number, or the mapping from characters to numbers
can change based on the index in the string.

In any case, as others pointed out in this thread, if I really
had the code on my computer I would break out my disassembler
and start tracing code to reverse engineer it.  Much easier
than trying to figure out the algorithm on these clues.


				Andrew
				dalke at dalkescientific.com



More information about the Python-list mailing list