re.sub() backreference bug?

Tim Chase python.list at tim.thechases.com
Thu Aug 17 17:36:44 EDT 2006


> s = re.sub(r'([A-Z]+)([A-Z][a-z])', "\1_\2", s)
> s = re.sub(r'([a-z\d])([A-Z])', "\1_\2", s)
> i expect to get:
> hello_world19_foo_bar
> 
> but instead i get:
> hell☺_☻orld19_fo☺_☻ar


Looks like you need to be using "raw" strings for your 
replacements as well:

s = re.sub(r'([A-Z]+)([A-Z][a-z])', r"\1_\2", s)
s = re.sub(r'([a-z\d])([A-Z])', r"\1_\2", s)

This should allow the backslashes to be parsed as backslashes, 
not as escape-sequences (which in this case are likely getting 
interpreted as octal numbers)

-tkc






More information about the Python-list mailing list