Regex help...pretty please?

Simon Forman rogue_pedro at yahoo.com
Wed Aug 23 16:46:51 EDT 2006


MooMaster wrote:
> I'm trying to develop a little script that does some string
> manipulation. I have some few hundred strings that currently look like
> this:
>
> cond(a,b,c)
>
> and I want them to look like this:
>
> cond(c,a,b)
>
> but it gets a little more complicated because the conds themselves may
> have conds within, like the following:
>
> cond(0,cond(c,cond(e,cond(g,h,(a<f)),(a<d)),(a<b)),(a<1))
>
> What I want to do in this case is move the last parameter to the front
> and then work backwards all the way out (if you're thinking recursion
> too, I'm vindicated) so that it ends up looking like this:
>
> cond((a<1), 0, cond((a<b),c,cond((a<d), e, cond((a<f), g, h))))
>
> futhermore, the conds may be multiplied by an expression, such as the
> following:
>
> cond(-1,1,f)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float(a))
>
> Here, all I want to do is switch the parameters of the conds without
> touching the expression, like so:
>
> cond(f,-1,1)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float(a))
>
> So that's the gist of my problem statement. I immediately thought that
> regular expressions would provide an elegant solution. I would go
> through the string by conds, stripping them & the () off, until I got
> to the lowest level, then move the parameters and work backwards. That
> thought process became this:
> -------------------------------------CODE--------------------------------------------------------
> import re
>
> def swap(left, middle, right):
>     left = left.replace("(", "")
>     right = right.replace(")", "")
>     temp = left
>     left = right
>     right = temp
>     temp = middle
>     middle = right
>     right = temp
>     whole = 'cond(' + left + ',' + middle + ',' + right + ')'
>     return whole
>
> def condReplacer(string):
>      #regex = re.compile(r'cond\(.*,.*,.+\)')
>      regex = re.compile(r'cond\(.*,.*,.+?\)')
>      if not regex.search(string):
>           print "whole string is: " + string
>           [left, middle, right] = string.split(',')
>           right = right.replace('\'', ' ')
>           string = swap(left.strip(), middle.strip(), right.strip())
>           print "the new string is:" + string
>           return string
>      else:
>           more_conds = regex.search(string)
>           temp_string = more_conds.group()
>           firstParen = temp_string.find('(')
>           temp_string = temp_string[firstParen:]
>           print "there are more conditionals!" + temp_string
>           condReplacer(temp_string)
> def lineReader(file):
>      for line in file:
>          regex = r'cond\(.*,.*,.+\)?'
>          if re.search(regex,line,re.DOTALL):
>             condReplacer(line)
>
> if __name__ == "__main__":
>    input_file = open("only_conds2.txt", 'r')
>    lineReader(input_file)
> -------------------------------------CODE--------------------------------------------------------
>
> I think my problem lies in my regular expression... If I use the one
> commented out I do a greedy search and in my test case where I have a
> conditional * an expression, I grab the expression too, like so:
>
> INPUT:
>
> cond(-1,1,f)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float(a))
> OUTPUT:
> whole string is:
> (-1,1,f)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float
>     (a))
> the new string
> is:cond(f*((float(e*(2**4+(float(d*8+(float(c*4+(float(b*2+float
> (a,-1,1)
>
> when all I really want to do is grab the part associated with the cond.
> But if I do a non-greedy search I avoid that problem but stop too early
> when I have an expression like this:
>
> INPUT:
> cond(a,b,(abs(c) >= d))
> OUTPUT:
> whole string is: (a,b,(abs(c)
> the new string is:cond((abs(c,a,b)
>
> Can anyone help me with the regular expression? Is this even the best
> approach to take? Anyone have any thoughts?
>
> Thanks for your time!

You're gonna want a parser for this.  pyparsing or spark would suffice.
 However, since it looks like your source strings are valid python you
could get some traction out of the tokenize standard library module:

from tokenize import generate_tokens
from StringIO import StringIO

s =
'cond(-1,1,f)*((float(e)*(2**4))+(float(d)*8)+(float(c)*4)+(float(b)*2)+float(a))'

for t in generate_tokens(StringIO(s).readline):
    print t[1],


Prints:
cond ( - 1 , 1 , f ) * ( ( float ( e ) * ( 2 ** 4 ) ) + ( float ( d ) *
8 ) + ( float ( c ) * 4 ) + ( float ( b ) * 2 ) + float ( a ) )

Once you've got that far the rest should be easy.  :)

Peace,
~Simon

http://pyparsing.wikispaces.com/
http://pages.cpsc.ucalgary.ca/~aycock/spark/
http://docs.python.org/lib/module-tokenize.html




More information about the Python-list mailing list