Regular expression bug?

Ron Garret rNOSPAMon at flownet.com
Thu Feb 19 13:55:01 EST 2009


I'm trying to split a CamelCase string into its constituent components.  
This kind of works:

>>> re.split('[a-z][A-Z]', 'fooBarBaz')
['fo', 'a', 'az']

but it consumes the boundary characters.  To fix this I tried using 
lookahead and lookbehind patterns instead, but it doesn't work:

>>> re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz')
['fooBarBaz']

However, it does seem to work with findall:

>>> re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz')
['', '']

So the regular expression seems to be doing the Right Thing.  Is this a 
bug in re.split, or am I missing something?

(BTW, I tried looking at the source code for the re module, but I could 
not find the relevant code.  re.split calls sre_compile.compile().split, 
but the string 'split' does not appear in sre_compile.py.  So where does 
this method come from?)

I'm using Python2.5.

Thanks,
rg



More information about the Python-list mailing list