[Tutor] hacking 101

Erik Price erikprice@mac.com
Sun, 31 Mar 2002 21:25:22 -0500


On Sunday, March 31, 2002, at 04:56  PM, Remco Gerlich wrote:

> At every place where you get user input, *in any form*, try to think of 
> the
> weirdest form it could take, the syntax you don't expect. Question your
> assumptions - if it's user input, they're not held to your assumptions.
>
> Input length is part of the input (but not as critical in Python as it 
> often
> is in C). Input timing is part of the input. Etc.
>
> Get into the mindset of someone who sees a set of rules for a game, and
> tries to figure out how to cheat at it.
>
> Focus at user input. Everything that comes from outside is suspect. 
> Trace
> the path of this data through your code.

I haven't yet had a chance to use Python for web stuff (I'm still 
learning the basics), but Remco sums up the attitude I take when 
programming app stuff in PHP -- for security, you need to have a robust 
server like apache that is configured such that it cannot be easily 
taken advantage of -- this is really beyond the topic of programming and 
more the domain of system administration -- and for the code itself, you 
write what I call "error-checking" functions to ensure that user input 
conforms to certain criteria or that your program knows what to do if it 
doesn't.  For instance, in the content mgmt site i'm developing for my 
employer, every single user input is checked via regexes to make sure 
that it's appropriate.  The app won't accept letters or punctuation when 
it's expecting an integer.  If the user enters this, I except an error 
message and re-display the user's values.  Really, my security is 
designed to protect the user from making a mistake than from an 
intruder, but it serves the same purpose.  I check for as many 
possibilities as I can possibly think of.  And while it would be nice to 
be able to accept "two-thousand-two" OR "2002" as input, it's really 
outside the scope of the app I am making to be this flexible.  I try to 
make sure that my forms use as many non-textual inputs as possible, 
limiting the user to making choices rather than producing their own 
input.

Of course, on this last note, be very careful -- just because the form 
only displays three choices doesn't mean that those are the only choices 
that the user has!  For instance, perhaps the user isn't even using a 
web browser to communicate with the script!  Perhaps they've telnetted 
in and are submitting some POST data that I wasn't prepared for.  If 
this is the case, then I am in trouble if I expected that the user would 
enter nothing but 1, 2, or 3 for instance.  Remember that all HTTP data 
is plaintext unless you've encrypted it, so even a password form is 
pretty much wide open to the internet if you're not using SSL.  Make 
sure that once your data has successfully and safely made the trip from 
the client to the server that the data is still secure -- FreeBSD seems 
like a reasonably secure box but still needs to be maintained -- shut 
down unnecessary services, make sure you have a firewall, don't give any 
access to any parts of the box that aren't needed, if your database is 
shared then use md5() to encrypt passwords etc etc etc.

There's quite a bit to learn in this topic, and I'm sorry I don't have 
any python-specific advice.  But I'm sure there are python equivalents 
to anything that can be done in PHP, like session management and the 
htmlentities() function (translates all user data into entity form so 
that they can't 'escape' the input with quotes or > etc).  Really, 
I've never seen a hard-and-fast list of what to watch for.  Just get 
into web development and learn as you go.  Everyone makes mistakes with 
this, and I'm no exception.


Erik