bytecode obfuscation

Sun Feb 6 23:15:14 EST 2005

On Sun, 2005-02-06 at 08:19, Philippe Fremy wrote:
> Adam DePrince wrote:
> > No amount of obfuscation is going to help you.
> 
> Theorically, that's true. Anything obfuscated can be broken, just like 
> the non obfuscated version. However it takes more skills and time to 
> break it. And that's the point. By raising the barrier for breaking a 
> product, you just eliminate a lot of potential crackers.
> 
> > The worst case if you depend on obscurity:  The bad guys are rounding
> > off your pennies as you read this.
> 
> That's the worst case, we all know that. A good case is to rely on open 
> spec and standard. However, obscurity can help if added on top of that.
> 
> I remember an article of Fyodor explaining that if you run your internal 
> apache server on port 1234 (for an enterprise) with all the normal 
> security turned on, you avoid 80% of the common cracker. You should not 
> rely on the port number for security but changing it improves it.

Okay, so only 20% of the script kiddies are filling the edges of your
coins.   

It would seem that the article you quote is addressing the computer
equivalent of a burglar.  There is nothing personal between a burglar
and a home owner (from the formers perspective at least.)  Any home will
do, and their victim is chosen from a large pool of equally
uninteresting targets on the basis of their vulnerability.   Those 80%
don't notice your port 1234 because it serves no purpose.  They aren't
looking to break into you specifically, they are looking to break in
anywhere, and the act of changing your port, much like attaching a chain
to your wallet, makes you not worth the effort.

There are two problems with your analogy.   First, you are forgetting
the economics of the situation.  The low value of cracking yet another
web server over shadows only the minuscule cost of port scanning the
potential mark.

The second problem with your analogy lies in the very nature of the
problem you are trying to solve.  Decompiling and securing your website
are two different beasts, so much that it is not possible to draw
conclusions about one from experiences with the other.  The former is
simply a matter of reading the code and emulating your cpu/virtual
machine/etc by hand with a paper and pencil.  Time consuming and
tedious, but quite certain in its outcome.

The barrier to decompiling this code has to be raised above the benefit
you will get from decompiling it.  If the server that this code to works
correctly and depends on no magic within the client to protect itself
from rounded pennies, then you are fine even if the code is reverse
engineered, for the problem becomes one of the correctness of the
implementation of the server code in the face of a hostile client. 
Doing this will foil your hacker by eliminating the benefit of reverse
engineering.  

If the server is not robust, then the payoff becomes all of your rounded
pennies.  Even by the decedent wage standards of American programmers,
the payoff greatly exceeds the labor cost of learning how your API works
from the byte code.   Remember, anything a computer can understand is a
language that a human can learn too (there is a reason why computer
science also goes by the name computational linguistics.)  Obfuscate
your code to the end of the world and all you have done is changed the
language in which the programmer will learn your algorithm.  

Big deal, instead of turning it into python, I print up the symbolic
representation of the python byte code, assembly, vhdl or whatever it
might be.   Low level languages are not that hard to read.  Remember, if
it can run on a computer, it can run on pencil and paper.   Just look at
the trouble DRM has ... they have to go so far as to introduce "trusted
hardware", aka a machine that won't let you dump the machine code to
your printer.

Adam DePrince