Python for Reverse Engineering

Sun Nov 7 20:40:16 EST 2004

"Brad Tilley" <rtilley at vt.edu> wrote in message
news:cmlj6b$7j9$1 at solaris.cc.vt.edu...
> Duncan Smith wrote:
> >>If your friend was worth his salt as an algorithm writer, he'd be
more...
> >>But--he is probably making impossible claims here, anyway...
> >>
> >
> >
> > [snip]
> >
> > Doesn't that depend on exactly what is meant by "figure out exactly how
the
> > algorithm works"?  If it means identify (with absolute certainty) the
> > algorithm used to generate the strings, then it surely can't be
possible.
> >
> > Duncan
>
> That's my thought as well. I don't want to know exactly how the
> algorithm generates strings. But, I think that if I analyze enough
> strings I should know, on some level, what an acceptable string looks
like.
>
> Samba coders never see Microsoft's file and print sharing source code,
> yet they are able to emulate an NT server quite well just by observing
> packets.

Right, so you might be able to come up with something that produces similar
output.  Do you know if the strings are generated independently?  If so,
there must be some stochastic component (or unknown inputs) or the strings
would be identical.  How about the frequencies of the characters?  Are some
(significantly) more frequent than others?  Do some characters follow others
with unusually high frequency?  Do characters tend to cluster (more or less
that you'd expect from independently generated characters)?  Unless the
strings are very long you probably can't answer these questions too reliably
with only 100 strings.

Markov Chains are a possibility (as already mentioned).  I'd probably start
by looking at the simpler things first.  The 1000 to 1200 sum might be a
clue, particularly if you're having trouble emulating it.  Of course, if
this turns out to be some sort of code and you're looking at some encoded
text, then that's something I know little about, and the above might be next
to useless.

Maybe the basic statistical tests in Gary Strangman's stats.py would be
useful, unless you already have R and RPy?

Duncan