schizophrenic view of what is white space
Jean-Paul Calderone
exarkun at divmod.com
Thu Dec 4 09:33:37 EST 2008
On Thu, 04 Dec 2008 14:27:49 +0000, Robin Becker <robin at reportlab.com> wrote:
>Is python of two minds about what is white space. I notice that split, strip
>seem to regard u'\xa0' (NO-BREAK SPACE) as white, but that code is not
>matched by the \s pattern. If this difference is intended can we rely on it
>continuing?
>
>
> >>> u'a b'.split()
>[u'a', u'b']
> >>> u'a\xa0b'.split()
>[u'a', u'b']
> >>> re.compile(r'\s').search(u'a b')
><_sre.SRE_Match object at 0x00DBB2C0>
> >>> re.compile(r'\s').search(u'a\xa0b')
> >>>
>
You have to give the re module an additional hint that you care about
unicode:
exarkun at charm:~$ python
Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> print re.compile(r'\s').search(u'a\xa0b')
None
>>> print re.compile(r'\s', re.U).search(u'a\xa0b')
<_sre.SRE_Match object at 0xb7dbb3a0>
>>>
Jean-Paul
More information about the Python-list
mailing list