split on NO-BREAK SPACE

Peter Kleiweg p.c.j.kleiweg at rug.nl
Sun Jul 22 11:44:51 EDT 2007


Carsten Haese schreef op de 22e dag van de hooimaand van het jaar 2007:

> On Sun, 2007-07-22 at 17:15 +0200, Peter Kleiweg wrote:
> > Is this a bug or a feature?
> > 
> > 
> >     Python 2.4.4 (#1, Oct 19 2006, 11:55:22) 
> >     [GCC 2.95.3 20010315 (SuSE)] on linux2
> > 
> >     >>> a = 'a b c\240d e'
> >     >>> a
> >     'a b c\xa0d e'
> >     >>> a.split()
> >     ['a', 'b', 'c\xa0d', 'e']
> >     >>> a = a.decode('latin-1')
> >     >>> a
> >     u'a b c\xa0d e'
> >     >>> a.split()
> >     [u'a', u'b', u'c', u'd', u'e']
> 
> It's a feature. See help(str.split): "If sep is not specified or is
> None, any whitespace string is a separator."

Define "any whitespace".
Why is it different in <type 'str'> and <type 'unicode'>?
Why does split() split when it says NO-BREAK?

-- 
Peter Kleiweg  L:NL,af,da,de,en,ia,nds,no,sv,(fr,it)  S:NL,de,en,(da,ia)
info: http://www.let.rug.nl/kleiweg/ls.html



More information about the Python-list mailing list