re.sub() backreference bug?

jemminger at gmail.com jemminger at gmail.com
Thu Aug 17 17:26:09 EDT 2006


using this code:

import re
s = 'HelloWorld19-FooBar'
s = re.sub(r'([A-Z]+)([A-Z][a-z])', "\1_\2", s)
s = re.sub(r'([a-z\d])([A-Z])', "\1_\2", s)
s = re.sub('-', '_', s)
s = s.lower()
print "s: %s" % s

i expect to get:
hello_world19_foo_bar

but instead i get:
hell☺_☻orld19_fo☺_☻ar

(in case the above doesn't come across the same, it's:
hellX_Yorld19_foX_Yar, where X is a white smiley face and Y is a black
smiley face !!)

is this a bug, or am i doing something wrong?

tested on
Python 2.4.1 (#65, Mar 30 2005, 09:13:57) [MSC v.1310 32 bit (Intel)]
on win32

and
Python 2.4.4c0 (#2, Jul 30 2006, 15:43:58) [GCC 4.1.2 20060715
(prerelease) (Debian 4.1.1-9)] on linux2




More information about the Python-list mailing list