schizophrenic view of what is white space

Jean-Paul Calderone exarkun at divmod.com
Thu Dec 4 09:33:37 EST 2008


On Thu, 04 Dec 2008 14:27:49 +0000, Robin Becker <robin at reportlab.com> wrote:
>Is python of two minds about what is white space. I notice that split, strip 
>seem to regard u'\xa0' (NO-BREAK SPACE) as white, but that code is not 
>matched by the \s pattern. If this difference is intended can we rely on it 
>continuing?
>
>
> >>> u'a b'.split()
>[u'a', u'b']
> >>> u'a\xa0b'.split()
>[u'a', u'b']
> >>> re.compile(r'\s').search(u'a b')
><_sre.SRE_Match object at 0x00DBB2C0>
> >>> re.compile(r'\s').search(u'a\xa0b')
> >>>
>

You have to give the re module an additional hint that you care about
unicode:

  exarkun at charm:~$ python
  Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) 
  [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> import re
  >>> print re.compile(r'\s').search(u'a\xa0b')
  None
  >>> print re.compile(r'\s', re.U).search(u'a\xa0b')
  <_sre.SRE_Match object at 0xb7dbb3a0>
  >>>

Jean-Paul



More information about the Python-list mailing list