[Tutor] Multiple regex replacements, lists and for.

Evert Rol evert.rol at gmail.com
Tue Oct 12 09:02:37 CEST 2010


> I'm new to python and inexperienced in programming but I'm trying hard.
> I have a shell script that I'm converting over to python.
> Part of the script replaces some lines of text.
> I can do this in python, and get the output I want, but so far only using sed.
> Here's an example script:
> 
> import subprocess, re
> list = ['Apples	the color red', 'Sky	i am the blue color', 'Grass	the
> colour green is here', 'Sky	i am the blue color']
> 
> def oldway():
> 	sed_replacements = """
> s/\(^\w*\).*red/\\1:RED/
> s/\(^\w*\).*blue.*/\\1:BLUE/"""
> 	sed = subprocess.Popen(['sed', sed_replacements],
> stdin=subprocess.PIPE, stdout=subprocess.PIPE)
> 	data = sed.communicate("\n".join(list))[:-1]
> 	for x in data:
> 		print x
> oldway();
> 
> """ This produces:
> 
>>>> Apples:RED
>>>> Sky:BLUE
>>>> Grass	the colour green is here
>>>> Sky:BLUE
> 
> Which is what I want"""
> 
> print "---------------"
> 
> def withoutsed():
> 	replacements = [
> (r'.*red', 'RED'),
> (r'.*blue.*', 'BLUE')]
> 	for z in list:
> 		for x,y in replacements:
> 			if re.match(x, z):
> 				print re.sub(x,y,z)
> 				break
> 			else:
> 				print z
> withoutsed();
> 
> """ Produces:
> 
>>>> RED
>>>> Sky	i am the blue color
>>>> BLUE
>>>> Grass	the colour green is here
>>>> Grass	the colour green is here
>>>> Sky	i am the blue color
>>>> BLUE
> 
> Duplicate printing + other mess = I went wrong"""
> 
> I understand that it's doing what I tell it to, and that my for and if
> statements are wrong.


You should make your Python regex more like sed. re.sub() always returns a string, either changed or unchanged. So you can "pipe" the two necessary re.sub() onto each other, like you do for sed: re.sub(replacement, replacement, re.sub(replacement, replacement, string). That removes the inner for loop, because you can do all the replacements in one go.
re.sub() will return the original string if there was no replacement (again like sed), so you can remove the if-statement with the re.match: re.sub() will leave the 'Grass' sentence untouched, but still print it.
Lastly, in your sed expression, you're catching the first non-whitespace characters and substitute them in the replacements, but you don't do in re.sub(). Again, this is practically the same format, the only difference being that in Python regexes, you don't need to escape the grouping parentheses.

I can give you the full solution, but I hope this is pointer in the right direction is good enough. All in all, your code can be as efficient in Python as in sed.

Cheers,

  Evert


Oh, btw: semi-colons at the end of a statement in Python are allowed, but redundant, and look kind of, well, wrong.


> What I want to do is replace matching lines and print them, and also
> print the non-matching lines.
> Can somebody please point me in the right direction?
> 
> Any other python pointers or help much appreciated,
> 
> Will.
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list