Simple and safe evaluator

Simon Forman sajmikins at gmail.com
Thu Jun 19 19:10:09 EDT 2008


On Jun 16, 8:32 pm, bvdp <b... at mellowood.ca> wrote:
> sween... at acm.org wrote:
> > On Jun 17, 8:02 am, bvdp <b... at mellowood.ca> wrote:
>
> >> Thanks. That was easy :)
>
> >>> The change to the _ast version is left as an exercise to the reader ;)
> >> And I have absolutely no idea on how to do this. I can't even find the
> >> _ast import file on my system. I'm assuming that the _ast definitions
> >> are buried in the C part of python, but that is just a silly guess.
>
> >> Bob.
>
> > If you just need numeric expressions with a small number of functions,
> > I would suggest checking the expression string first with a simple
> > regular expression, then using the standard eval() to evaluate the
> > result.  This blocks the attacks mentioned above, and is simple to
> > implement.  This will not work if you want to allow string values in
> > expressions though.
>
> > import re
> > def safe_eval( expr, safe_cmds=[] ):
> >    toks = re.split( r'([a-zA-Z_\.]+|.)', expr )
> >    bad = [t for t in toks if len(t)>1 and t not in safe_cmds]
> >    if not bad:
> >            return eval( expr )
>
> Yes, this appears to be about as good (better?) an idea as any.
> Certainly beats writing my own recursive decent parser for this :)
>
> And it is not dependent on python versions. Cool.
>
> I've run a few tests with your code and it appears to work just fine.
> Just a matter of populating the save_cmds[] array and putting in some
> error traps. Piece of cake. And should be fast as well.
>
> Thanks!!!
>
> Bob.

FWIW, I got around to implementing a function that checks if a string
is safe to evaluate (that it consists only of numbers, operators, and
"(" and ")").  Here it is. :)

import cStringIO, tokenize


def evalSafe(source):
    '''
    Return True if a source string is composed only of numbers,
operators
    or parentheses, otherwise return False.
    '''
    try:
        src = cStringIO.StringIO(source).readline
        src = tokenize.generate_tokens(src)
        src = (token for token in src if token[0] is not tokenize.NL)

        for token in src:
            ttype, tstr = token[:2]

            if (
                tstr in "()" or
                ttype in (tokenize.NUMBER, tokenize.OP)
                and not tstr == ',' # comma is an OP.
                ):
                continue
            raise SyntaxError("unsafe token: %r" % tstr)

    except (tokenize.TokenError, SyntaxError):
        return False

    return True

for s in (

    '(1 2)', # Works, but isn't math..

    '1001 * 99 / (73.8 ^ 88 % (88 + 23e-10 ))', # Works

    '1001 * 99 / (73.8 ^ 88 % (88 + 23e-10 )',
    # Raises TokenError due to missing close parenthesis.

    '(1, 2)', # Raises SyntaxError due to comma.

    'a * 21', # Raises SyntaxError due to identifier.

    'import sys', # Raises SyntaxError.

    ):
    print evalSafe(s), '<--', repr(s)






More information about the Python-list mailing list