Simple eval

Sun Nov 18 22:48:50 EST 2007

On Nov 18, 8:24 pm, greg <g... at cosc.canterbury.ac.nz> wrote:

> Tor Erik Sønvisen wrote:
> > Comments, speedups, improvements in general, etc are appreciated.
>
> You're doing a lot of repeated indexing of token[0]
> and token[1] in your elif branches. You might gain some
> speed by fetching these into locals before entering the
> elif chain.
>
> Also you could try ordering the branches so that the
> most frequent cases come first. Probably strings and
> numbers first, then the various kinds of bracket.
> This would also give you a chance to avoid pulling out
> token[1] until you need it.
>
> token[1].startswith('u'): It's probably faster to
> use an index to get the first character, if you know
> that the string is not empty.

I tried several of these micro optimizations but there was very little
improvement; eval() remains practically 5 times faster. The major
bottleneck is generate_tokens(); replacing simple_eval() with the
following is still 3 times slower than eval():

def simple_eval(source):
    for _ in generate_tokens(StringIO(source).readline): pass

That's not very surprising since generate_tokens() is quite general
and yields more information than necessary. Clearly if performance is
critical you should write your own simple_generate_tokens(), possibly
as a cut down version of the generic one.

Leaving performance aside, below is a slightly more compact version.
The almost identical code for handling lists and tuples is factored
out in _iter_sequence(). The 'token' parameter here is the actual
token, not the 5-tuple yielded by generate_tokens(). Finally this
version handles negative and long numbers (which the original didn't):

from string import digits
from cStringIO import StringIO
from tokenize import generate_tokens, NL

_consts = {'None': None, 'False': False, 'True': True}

def simple_eval(source):
    itertokens = generate_tokens(StringIO(source).readline)
    next = (token[1] for token in itertokens
            if token[0] is not NL).next
    res = atom(next, next())
    if next():
        raise SyntaxError("bogus data after expression")
    return res

def atom(next, token):
    def _iter_sequence(end):
        token = next()
        while token != end:
            yield atom(next, token)
            token = next()
            if token == ',':
                token = next()
    firstchar = token[0]
    if token in _consts:
        return _consts[token]
    elif token[-1] == 'L':
        return long(token)
    elif firstchar in digits:
        return float(token) if '.' in token else int(token)
    elif firstchar in '"\'':
        return token[1:-1].decode('string-escape')
    elif firstchar == 'u':
        return token[2:-1].decode('unicode-escape')
    elif token == '-':
        return -atom(next, next())
    elif token == '(':
        return tuple(_iter_sequence(')'))
    elif token == '[':
        return list(_iter_sequence(']'))
    elif token == '{':
        out = {}
        token = next()
        while token != '}':
            key = atom(next, token)
            next() # Skip key-value delimiter (':')
            token = next()
            out[key] = atom(next, token)
            token = next()
            if token == ',':
                token = next()
        return out
    raise SyntaxError('malformed expression (%r)' % token)

Regards,
George