Sanitizing untrusted code for eval()

Jim Washington jwashin at vt.edu
Mon Aug 22 19:53:46 EDT 2005


On Mon, 22 Aug 2005 22:12:25 +0200, Fredrik Lundh wrote:

> however, running a tokenizer over the source string and rejecting any string
> that contains unknown tokens (i.e. anything that's not a literal, comma, 
> colon,
> or square or curly bracket) before evaluation might be good enough.
> 
> (you can use Python's standard tokenizer module, or rip out the relevant 
> parts
> from it and use the RE engine directly)

This seems like the right compromise, and not too difficult. 
OOTB, tokenize burns a couple of additional milliseconds per read,
but maybe I can start there and optimize, as you say, and be a bit more
sure that python's parser is not abused into submission.

BTW, this afternoon I sent a couple of hours of random junk to eval()
just to see what would be accepted.

I did not know before that

5|3 = 7
6^3 = 5
~6 = -7
()and aslfsdf = ()

Amusing stuff.

Thanks!

-Jim Washington



More information about the Python-list mailing list