Regex substitution trouble

Chris Angelico rosuav at gmail.com
Tue Oct 28 08:59:31 EDT 2014


(Please quote enough of the previous text to provide context, and
write your replies underneath the quoted text - don't assume that
everyone's read the previous posts. Thanks!)

On Tue, Oct 28, 2014 at 11:28 PM,  <massi_srb at msn.com> wrote:
> Hi Chris, thanks for the reply. I tried to use look ahead assertions, in particular I modified the regex this way:
>
> newstring = re.sub(ur"""(?u)(\$\"[\s\w(?<=\\)\"]+\")""", subst, oldstring)
>
> but it does not work. I'm absolutely not a regex guru so I'm surely missing something.

Yeah, I'm not a high-flying regex programmer either, so I'll leave the
specifics for someone else to answer. Tip, though: Print out your
regex, to see if it's really what you think it is. When you get
backslashes and quotes coming through, sometimes you can get tangled,
even in a raw string literal; sometimes, one quick print(some_re) can
save hours of hair-pulling.

> The strings I'm dealing with are similar to formulas, let's say something like:
>
> '$["simple_input"]+$["messed_\\"_input"]+10'
>
> Thanks for any help!

Hmm. This looks like a job for ast.literal_eval with an actual
dictionary. All you'd have to do is replace every instance of $ with a
dict literal; it mightn't be efficient, but it would be safe. Using
Python 2.7.8 as you appear to be on 2.x:

>>> expr = '$["simple_input"]+$["messed_\\"_input"]+10'
>>> values = {"simple_input":123, "messed_\"_input":75}
>>> ast.literal_eval(expr.replace("$",repr(values)))
Traceback (most recent call last):
  File "<pyshell#4>", line 1, in <module>
    ast.literal_eval(expr.replace("$",repr(values)))
  File "C:\Python27\lib\ast.py", line 80, in literal_eval
    return _convert(node_or_string)
  File "C:\Python27\lib\ast.py", line 79, in _convert
    raise ValueError('malformed string')
ValueError: malformed string

Unfortunately, it doesn't appear to work, as evidenced by the above
message. It works with the full (and dangerous) eval, though:

>>> eval(expr.replace("$",repr(values)))
208

Can someone who better knows ast.literal_eval() explain what's
malformed about this? The error message in 3.4 is a little more
informative, but not much more helpful:
ValueError: malformed node or string: <_ast.BinOp object at 0x0169BAF0>
My best theory is that subscripting isn't allowed, though this seems odd.

In any case, it ought in theory to be possible to use Python's own
operations on this. You might have to do some manipulation, but it'd
mean you can leverage a full expression evaluator that already exists.
I'd eyeball the source code for ast.literal_eval() and see about
making an extended version that allows the operations you want.

If you can use something other than a dollar sign - something that's
syntactically an identifier - you'll be able to skip the textual
replace() operation, which is risky (might change the wrong thing). Do
that, and you could have your own little evaluator that uses the ast
module for most of its work, and simply runs a little recursive walker
that deals with the nodes as it finds them.

ChrisA



More information about the Python-list mailing list