split on NO-BREAK SPACE

Carsten Haese carsten at uniqsys.com
Sun Jul 22 12:13:36 EDT 2007


On Sun, 2007-07-22 at 17:44 +0200, Peter Kleiweg wrote: 
> > It's a feature. See help(str.split): "If sep is not specified or is
> > None, any whitespace string is a separator."
> 
> Define "any whitespace".

Any string for which isspace returns True.

> Why is it different in <type 'str'> and <type 'unicode'>?

>>> '\xa0'.isspace()
False
>>> u'\xa0'.isspace()
True

For byte strings, Python doesn't know whether 0xA0 is a whitespace
because it depends on the encoding whether the number 160 corresponds to
a whitespace character. For unicode strings, code point 160 is
unquestionably a whitespace, because it is a no-break SPACE.

> Why does split() split when it says NO-BREAK?

Precisely. It says NO-BREAK. It doesn't say NO-SPLIT.

-- 
Carsten Haese
http://informixdb.sourceforge.net





More information about the Python-list mailing list