Obfuscator, EXE, etc. - a solution

Dave Brueck dave at pythonapocrypha.com
Wed Apr 7 06:55:24 EDT 2004


Jason wrote:
> There have been many many many many discussions about obfuscating
> python.  To my dismay, most who answer are those who frequently post,
> and they say things such as:
> 1) what's the point, in theory anything could eventually be decompiled
> 2) python is used for mostly internal stuff anyway, cuz its a "glue"
> language, so why bother

Haven't heard #2 much, btw.

> 3) use licensing and a good lawyer, it's the ONLY way
> 4) many programmers seem comfortable releasing their java and .net and
> other interpreted code products into the market, so why not you?
>
> I found most of these comments dismissive, and sometimes quite arrogant.
>   Frankly, the reasons why anyone would want to protect their code is
> simple and should be observed because we are all programmers: we want to
> protect our hard work.

I think you missed a very large class of responders who feel that some minor
obfuscation is okay - something that keeps the honest people honest - but that
anything beyond that is a waste of time because no wall you put up will be high
enough to keep out someone bent on cracking your code. So, e.g., if your build
process spits out all your .py files in a zip file that has been passed through
some simple encryption and you use a custom import hook, then you're good to
go.

IMO the real problem with so many of the schemes people have proposed is that
they place an inordinate burden on the developer for very little improvement in
"security". And THAT is what leads the discussion to some of the answers you
listed above. If an obfuscation scheme makes debugging too difficult, or makes
the programmer avoid a big set of useful features, etc, then it's not worth it
because at best it will improve security only slightly.

I'd wager that many people aren't opposed to obfuscation per se, they're
opposed to obfuscation whose cost is greater than the benefit, and that is the
case with 99% of the schemes suggested. That's the group I fall into - I think
code obfuscation is interesting but rarely worth it.

I'd like to add to your list:

5) Source code is much less valuable than many people think. With the exception
of stuff like code to read/write proprietary file formats or wildly efficient
implementations of certain algorithms, there's just not a lot of code out there
that, in itself, is all that special. I mean, I take pride in my work and don't
want to get ripped off, but I'm not fooling myself - it's stuff people could
figure out on their own (so it begs the question: specifically what program do
you have that is so innovative that warrants something more than a license and
perhaps the most trivial code obfuscation?)

Note that file formats and algorithms themselves are patentable and (setting
aside whether or not you agree that these types of patents are a Good Thing)
are even better than protecting the code because you're protected even in the
case of a clean room implementation.

Also, just getting your hands on the source code is a far cry from
understanding it well enough to maintain it, improve it, and extend it. In
order to do that reliably, you have to become quite familiar with the code, and
to do that you go through much of the same process one would go through to
write it from scratch in the first place (this is also why, for example, when a
developer leaves a company it's not uncommon for that developer's code to get
tossed/replaced soon - the company still has the code but it doesn't have the
knowledge and understanding that went into making the code, and re-acquiring
that knowledge requires about as much work as rewriting it does).

A common use case for code protection is for code that ensures the user has the
rights to use the software (e.g. you can't play the game unless you have a
valid CD key). This is an interesting one to me, but the abundance of cracks
and warez out there is a clear indication that hiding the source code does
little to hinder the crackers. The solutions that do work or at least provide
some relief apply to Python programs as well as they do to C programs.

6) When people talk about obfuscating their code, they don't really seem to
spend much time thinking about who exactly they are hiding it from. There are
people who will take your code to make money off it (businesses) and everyone
else. Businesses are generally _very_ careful about following license
agreements wrt code because of the liabilities - I'd LOVE it if a business
stole my code. To a business, a license is far bigger detriment than any code
hiding.

The non-business group of people who will not be stopped by simple code
obfuscation are also the people who can't really be beat by ANY obfuscation:
for any amount of time T that you're willing to spend creating schemes to hide
code from them, they are willing to spend some amount of time U to circumvent
it, where U >> T. Heck, for many of them it's a game. So you have to decide how
much time you're willing to spend in an arms race with people who you're not
going to make any money off anyway. It's not like these are would-be customers,
and most of the people who use their cracks aren't would-be customers either,
and hiding the source code does little to prevent cracks anyway.

> 1) Anything could eventually be decompiled.... yes that's true.  In a
> perfect world.  Have you ever tried to decompile C code and make sense
> of it?

It's non-trivial, but certainly not extremely hard, especially once you know
the compiler that was used (and that's often easy to determine / guess).

> Try a large C program.  Good luck, you philosophers.

Philosophy? Not needed - there are *excellent* commercial decompilers
available. That's why most software licenses explicitly forbid reverse
engineering and decompiling - because any Joe can find the tools to do it.

> 2) I don't see Python as merely a glue language.

Does anybody?

> 3) Using licensing and a good lawyer.  I'm all for that!  Now your code
> has been stolen... and you are going to hire a lawyer to fight it out in
>   court.

Well, do you have a specific example of something that's steal-worthy? There is
so little steal-worthy code that it's the exception rather than the rule, so
for most people and most code this is an irrational scenario to fret about.
What, your code was the missing link in their business, and now they are raking
in the cash thanks to you? Not all that likely...

I'm not saying that there aren't cases where it shouldn't end up in a court
battle, just that in reality they'd be really rare, and when they did happen it
WOULD be worth it for you to stick it out to the end because (1) there would be
significant upside and (2) if they did steal your code and use it you'd
probably have a good chance of proving it.

Stealing code is extremely risky for a company (almost unlimited liability).

> 4) Others release their java and .net programs.  Many obfuscate their
> code before doing so, for the very same reasons a Python programmer
> would want to do so.

But just because other people do it doesn't mean that (1) what they're doing
really makes sense or that (2) the benefit exceeds the cost or that (3) what
they are protecting is worth stealing.

(besides, if they're using Java or .Net then their judgement skills are already
suspect ;-)   It's a joke. Laugh. )

> Here's my solution, it's not perfect, but it works well:
> Use Pyrex, which translates your python sources (virtually unchanged) to

Hey, if this works for you, more power to you - that's great that you've found
something that suits your needs.

For me, this is yet another example of a "solution" that puts way too much
burden on the developer while providing (at best) modest gains. Pyrex is
awesome for writing extensions, but last time I checked it didn't support (in
addition to what you listed) list comprehensions, generators, functions defined
inside functions, and Unicode. These are all things I'm willing to live without
when writing Pyrex extensions, but certainly not in general - they are features
that for me provide tangible benefit (in terms of productivity, code
maintainability, etc.), so avoiding them is an actual cost. The benefit over
e.g. a trivially-encrypted zip of the .pyc files is tiny, so the scheme doesn't
seem worth it.

Again though, if it works for you, great. I definitely wouldn't consider it a
widely applicable method of code obfuscation though.

-Dave





More information about the Python-list mailing list