schizophrenic view of what is white space

MRAB google at mrabarnett.plus.com
Thu Dec 4 13:06:38 EST 2008


Robin Becker wrote:
> Jean-Paul Calderone wrote:
> .........
>>
>> You have to give the re module an additional hint that you care about
>> unicode:
>>
>>  exarkun at charm:~$ python
>>  Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)  [GCC 4.2.3 (Ubuntu 
>> 4.2.3-2ubuntu7)] on linux2
>>  Type "help", "copyright", "credits" or "license" for more information.
>>  >>> import re
>>  >>> print re.compile(r'\s').search(u'a\xa0b')
>>  None
>>  >>> print re.compile(r'\s', re.U).search(u'a\xa0b')
>>  <_sre.SRE_Match object at 0xb7dbb3a0>
>>  >>>
>>
>> Jean-Paul
> .......
> 
> so the default behaviour differs for unicode and re working on unicode. 
> I suppose that won't be true in Python 3.
 >
I'm not sure why the Unicode flag is needed in the API. I reckon that it 
should just look at the text that the regular expression is being 
applied to: if it's Unicode then follow the Unicode rules, if not then 
don't.



More information about the Python-list mailing list