Strategy to Verify Python Program is POST'ing to a web server.

Sat Jun 18 13:26:14 EDT 2011

On Sat, Jun 18, 2011 at 9:34 PM, mzagursk at gmail.com <mzagursk at gmail.com> wrote:
> I am wondering what your strategies are for ensuring that data
> transmitted to a website via a python program is indeed from that
> program, and not from someone submitting POST data using some other
> means.  I find it likely that there is no solution, in which case what
> is the best solution for sending data to a remote server from a python
> program and ensuring that it is from that program?

You're correct there: there is no solution. Everything on the other
side of your network cable should be treated as hostile and spoofed.
But the real question is, how much effort are people likely to go to
to avoid using your program?

SSL certificates are good, but they can be stolen (very easily if the
client is open source). Anything algorithmic suffers from the same
issue.

In the example you gave, there's no solution. Someone could easily
spoof it and stuff the ballot. But if you make that more difficult
than the survey is worth, then you can largely trust your data.

The other common reason for wanting to be sure that the far end really
is your script is when you're trusting the client to do data
validation. There's a solution to that one: repeat the validation on
the server, and then it doesn't matter if they use your program or
not. (And before you cry "Isn't that obvious?", a lot of people have
completely missed that point.) In neither case can you prove what
program was on the far end. You're working with network packets, so
anything can be spoofed. You could go a long way toward it, though, by
using something ridiculously complex, such as:

* Client connects via SSL to host, using a known certificate.
* Server verifies certificate, and sends client some Python code to execute.
* Client verifies the server's certificate (vital!).
* Client executes the code it's given, and based on the result, plus
some other data, sends the server a hash value.
* Server executes the same code it gave the client, knows the data it
was working with, and calculates the equivalent hash.
* If the two hashes match, the client is deemed to be valid.

This is a variant of the usual nonce-based hashing systems, where the
nonce in question is actually executable code. By randomizing the
code, you can make it difficult for any non-Python program to
duplicate the hash algorithm. But it still won't provide certainty, by
any means.

I've spent quite a bit of time this past fortnight explaining some of
these concepts to my boss and one of my coworkers; they were building
a rather elaborate system but didn't realise that, apart from
requiring about three times as much data from /dev/random, it wasn't
materially different from a simple SSL cert check...

Chris Angelico