Python obfuscation

Serge Orlov Serge.Orlov at gmail.com
Fri Nov 18 13:31:37 EST 2005


Ben Sizer wrote:
> Mike Meyer wrote:
> > "Ben Sizer" <kylotan at gmail.com> writes:
> > > Decompyle (http://www.crazy-compilers.com/decompyle/ ) claims to be
> > > pretty advanced. I don't know if you can download it any more to test
> > > this claim though.
> >
> > No, it doesn't claim to be advanced. It claims to be good at what it
> > does. There's no comparison with other decompilers at all. In
> > particular, this doesn't give you any idea whether or not similar
> > products exist for x86 or 68k binaries.
>
> That's irrelevant. We don't require a citable source to prove the
> simple fact that x86 binaries do not by default contain symbol names
> whereas Python .pyc and .pyo files do contain them. So any
> decompilation of (for example) C++ code is going to lose all the
> readable qualities, as well as missing any symbolic constants,
> enumerations, templated classes and functions, macros,  #includes,
> inlined functions, typedefs, some distinctions between array indexing
> and pointer arithmetic, which inner scope a simple data variable is
> declared in, distinctions between functions/member functions declared
> as not 'thiscall'/static member functions, const declarations, etc.

If you protection is actually boils down to "if (licensed) ..."
everything you described will just slightly inconvinient an experienced
cracker. I've read a cracker's detailed walkthrough, it took him 26
minutes to crack a program that asks for a serial number. Basically it
looks like this: set breakpoint on event where "OK" button is pressed
after a serial number is entered, set watchpoint on memory where the
serial number is stored, study all places where this memory is read,
find the ultimate "jump if" instruction.



>
> > I've dealt with some very powerfull disassemblers and
> > decompilers, but none of them worked on modern architectures.
>
> You can definitely extract something useful from them, but without
> symbol names you're going to have to be working with a good debugger
> and a decent knowledge of how to use it if you want to find anything
> specific. Whereas Python could give you something pretty obvious such
> as:
>
>    6 LOAD_FAST                0 (licensed)
>    9 JUMP_IF_FALSE            9 (to 21)

I can suggest at least two methods to obfuscate python byte code:

1. Apply some function before writing byte code to file, apply reverse
function upon reading.

2. Take opcodes.h and assign new random numbers to opcodes, also take
ceval.c and reorder opcode handlers in the switch statement to make
reverse engeneering even harder.

I believe this will require at least several hours of manual work
before you can use stock python disassembler.


>
> My interest lies in being able to use encrypted data (where 'data' can
> also include parts of the code) so that the data can only be read by my
> Python program, and specifically by a single instance of that program.
> You would be able to make a backup copy (or 20), you could give the
> whole lot to someone else, etc etc. I would just like to make it so
> that you can't stick the data file on Bittorrent and have the entire
> world playing with data that was only purchased once.

This is doable even in python. Basic idea is that you need to spread
your obfuscation code and blend it with algorithm:

1. Generate user identity on your server and insert it inside your
distribution. Spread it all over the code, don't store it in a file,
don't store in one big variable, instead divide the user identity in
four bits part and spread their storage over different places. Note
this actually doesn't have anything to do with python, it's true for
C/C++. If you don't follow this your protection is vulnerable to replay
attack: crackers will just distribute data file + stolen user identity.

2. Generate custom data files for each user, using various parts of
user id as scrambling key for different parts of the data file. For
example: suppose you have data file for a game and you store initial
coordinates of characters as coordinates (0..65535,0..65535) as four
bytes. Normal code to load them from file would like like

x,y = buf[0]+256*buf[1], buf[2]+256*buf[3]

obfuscated would look like

x,y = buf[0]+c*((buf[1]+ t + 7)&c), buf[2]+c*((buf[1]+ t + 7)&c)

where t contains some bits from user id and c==256



I hope it's not very vague description. I think this approach will do
what you want. Don't forget that you will also need to bind you program
to hardware, or users will just distribute your program + data file
together. I hope they won't mind that your program is tied to one
computer :)




More information about the Python-list mailing list