Python obfuscation

Alex Martelli aleax at mail.comcast.net
Sat Nov 12 12:46:05 EST 2005


Yu-Xi Lim <yuxi at ece.gatech.edu> wrote:

> Alex Martelli wrote:
> > There is no effective manner of protecting your code, except running it
> > only on well-secured machines you control yourself.  If you distribute
> > your code, in ANY form, and it's at all interesting to people with no
> > interest in respecting the law, then, it WILL be cracked (and if users
> > choose to respect the law, then you need no "protecting").
> 
> Indeed. An this extends to web services too. If you have input which can
> be observed (or even better, controlled) and output that can be observed
> too, one would be able to infer the workings of the code (reverse 
> engineering in one of its purest forms).

...unless you have "nonobservable state", of course, in which case the
inference is conceptually impossible.  For example, say that you have
developed a new and revolutionary system to predict weather, much better
than anything the competition has.  You offer it as a for-pay web
service, the customer-supplied inputs being the space-time coordinates
at which prediction is required, while the customer-provided outputs are
a vector of possible weather conditions each with an attached
probability, just as they might be for ANY weather-prediction web
service, except that (by hypothesis, or else you won't make much money
on this;-) the outputs of your weather predictor match reality much
better than the competitors'.  "To infer" whatever would essentially
mean to reinvent your whole "revolutionary system" from scratch.

Much the same would apply if what your system is able to predict better
than your competitors' is any other kind of phenomenon of economic
interest in a sufficiently complex real-world system -- from interest
rates to the probability that two would-be online daters will like each
other.  And it doesn't have to be prediction -- one famous system where
ESR, as a consultant, advised his clients to keep their program a trade
secret, was "just" a better heuristic than any of their competitors' for
cutting a set of given shapes with automated tools out of a slab of
wood, if I recall correctly... a problem that's computationally
intractable to solve anything but heuristically, and a better heuristic
saves wood, worktime, and/or wear and tear on the tools, therefore is
worth money.

In practice, you ARE going to be able to operate your system
successfully bases on keeping a good innovative algorithm or heuristic
secret, for a while -- until somebody else independently reinvents it
(or, invents something even better, in which case your secret may become
irrelevant).  IP protection is a possibility, but copyright per se might
be too weak, and whether patents apply in any given case is always
controversial (Europe soundly defeated a proposed software patent
directive, after a bitter fight, less than a year ago).

 
> If your business strategy relies heavily on a proprietary algorithm or
> even something as weak as lock-in via a proprietary "un-interoperable"
> data format, then web services is not the final answer. It may work for
> certain applications (Microsoft's for example) where the cost of reverse
> engineering is equivalent to the cost of building from scratch.

...and the latter is going to be the case for many important
"proprietary algorithms", as above exemplified.

A cryptographically sound "proprietary data format" may be essentially
impossible to break, too -- although, differently from many potential
algorithms, it has per se no added value, and may run afoul of sensible
legislation (or sensible would-be customers), such as Massachussets',
mandating the use of standard data formats.


Alex



More information about the Python-list mailing list