schizophrenic view of what is white space
MRAB
google at mrabarnett.plus.com
Thu Dec 4 13:06:38 EST 2008
Robin Becker wrote:
> Jean-Paul Calderone wrote:
> .........
>>
>> You have to give the re module an additional hint that you care about
>> unicode:
>>
>> exarkun at charm:~$ python
>> Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3 (Ubuntu
>> 4.2.3-2ubuntu7)] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>> >>> import re
>> >>> print re.compile(r'\s').search(u'a\xa0b')
>> None
>> >>> print re.compile(r'\s', re.U).search(u'a\xa0b')
>> <_sre.SRE_Match object at 0xb7dbb3a0>
>> >>>
>>
>> Jean-Paul
> .......
>
> so the default behaviour differs for unicode and re working on unicode.
> I suppose that won't be true in Python 3.
>
I'm not sure why the Unicode flag is needed in the API. I reckon that it
should just look at the text that the regular expression is being
applied to: if it's Unicode then follow the Unicode rules, if not then
don't.
More information about the Python-list
mailing list