Python obfuscation

Yu-Xi Lim yuxi at ece.gatech.edu
Sat Nov 12 03:21:30 EST 2005


Alex Martelli wrote:
> It's interesting, in this context, that Civilization IV is mostly
> written in Python (interfaced to some C++ via BoostPython).
> 
> It took me 12 seconds with a search engine to determine that CivIV's
> protection uses "SafeDisc 4.60" and 30 more seconds to research that
> issue enough to convince myself that there's enough information out
> there that I could develop a crack for the thing (if I was interested in
> so doing), quite apart from any consideration of the languages and
> libraries used to develop it -- and I'm not even a particularly good
> cracker, nor am I wired into any "underground channels", just looking at
> information easily and openly available out on the web and in the index
> of a major search engine.

Yes, I never said it's uncrackable. The cracks available are iffy and 
the alternatives are sufficiently inconvenient to dissuade the 
less-savvy user from attempting them. In which case, the copy protection 
has succeeded.

> What I think of this thesis is on a par of what I think of this way of
> spelling the possessive adjective "its" (and equally unprintable in
> polite company).  If I could choose to eradicate only one of these two
> from the world, I'd opt for the spelling -- the widespread and totally
> unfounded belief in the worth of obfuscation is also damaging, but less
> so, since it only steals some time and energy from developers who (if
> they share this belief) can't be all that good anyway;-).

The level of pedantry here is amazing and it doesn't apply only to 
programming languages. While we are discussing my typos, I'd like to 
note that I may accidentally interchange "you're" and "your", "there", 
"they're", and "their", and a bunch of other homonyms.

I hadn't seen any damage done from misusing "it's". Certainly not on par 
with the Sony case which Mike Meyer cites as evidence against copy 
protection (and presumably obfuscation, which was the topic of the 
discussion)



This topic seems to be drifting. I thought I might clarify what I mean 
by "code obfuscation" to get things back on track.

Code obfuscation is a transformation of the program (whether at source 
code level, intermediate object code level, binary executable level, 
etc) to hinder (prevention seems impossible) reverse engineering 
(attempts to determine the workings of the code, to modify the function 
of the code, etc). While there are many possible transformations that 
can be done on programs (compression, run-time optimizations, etc), the 
key here is the intent to hinder reverse engineering. I hope this is 
agreeable to everyone.

Python already conveniently supports certain transformations on 
programs. Off the top of my head, I think of compiled bytecode (pyc and 
pyo files), and modules in zip archives. Any of these can be used as a 
means of obfuscation. (Compiled languages naturally undergo 
transformations which tend to have more effectiveness against reverse 
engineering)

Now, to address points made by Mike Meyer. He says that obfuscation adds 
steps to the release process and also cites Sony's XCP fiasco as an 
example of unseen costs of "copy protection".

Indeed, everything has a cost, and I was wrong in saying "free". 
However, if convenient language-supported transforms are used, the 
direct cost of using obfuscation would be miniscule in comparison to 
just about everything else. Implementing it should be one simple step, 
and testing it shouldn't be required (if you reasonably assume the 
language isn't broken).

I am going to ignore certain aspects of the Sony XCP case, such as the 
bad EULA and the bad PR (we shall leave that to the lawyers and 
marketing folk and stick to something we programmers can actually fix). 
What we have left is a broken software implementation of copy 
protection. If language-supported (or even OS-supported, which would 
have helped Sony*) transformations are used, we can expect to rule out 
such brokenness, i.e. no obfuscation-induced incompatibilities and 
related help-desk calls. This further reduces the unexpected costs of 
code obfuscation to zero (did I miss anything?)

This form of obfuscation is certainly weak, but given that the costs are 
so tiny, why not use it? Even if you could gain one customer (and a few 
dollars if you're a shareware developer), you have more than recuperated 
your costs. If you don't, you probably lost 5 minutes of development 
time. Is this a worthwhile gamble? I believe so.

Mike Meyer may reiterate his point about "keeping honest people honest" 
and thus such obfuscation has little ("insignificant") benefit. Whether 
this little difference is "insignificant" is up to the 
developer/publisher/etc to decide. My thesis (to borrow Alex Martelli's 
language) is that it is possible to obtain *some* benefit from 
obfuscation with *minimal* costs.

There are physical examples of attempts to hinder reverse engineering: 
glueing the cases of devices shut and sealing integrated circuits in a 
blob of epoxy, among others. With such examples, I don't think it's 
unreasonable to believe that similar possibilities exist for software 
products. This is not cited as concrete evidence, just something that 
hints at a possibility.


* Someone may start crying out, "DRM-supporter! Burn him at the stake!" 
I think code obfuscation and DRM should be approached as seperate 
issues, unless one believes that the user's right to software includes 
unlimited access to the source code. That itself is also a separate 
discussion, imo.



More information about the Python-list mailing list