split on NO-BREAK SPACE
Steve Holden
steve at holdenweb.com
Sun Jul 22 18:13:45 EDT 2007
Jean-Paul Calderone wrote:
> On Sun, 22 Jul 2007 21:13:02 +0200, Peter Kleiweg
<p.c.j.kleiweg at rug.nl> wrote:
>> Carsten Haese schreef op de 22e dag van de hooimaand van het jaar 2007:
>>
>>> On Sun, 2007-07-22 at 17:44 +0200, Peter Kleiweg wrote:
>>>>> It's a feature. See help(str.split): "If sep is not specified or is
>>>>> None, any whitespace string is a separator."
>>>> Define "any whitespace".
>>> Any string for which isspace returns True.
>> Define white space to isspace()
>>
>>>> Why is it different in <type 'str'> and <type 'unicode'>?
>>>>>> '\xa0'.isspace()
>>> False
>>>>>> u'\xa0'.isspace()
>>> True
>> Here is another "space":
>>
>> >>> u'\uFEFF'.isspace()
>> False
>>
>> isspace() is inconsistent
>
> It's only inconsistent if you think it should behave based on the
> name of a unicode code point. It doesn't use the name, though. It
> uses the category. NO-BREAK SPACE is in the Zs category (Separator, Space).
> ZERO WIDTH NO-BREAK SPACE is in the Cf category (Other, Format).
>
> Maybe that makes unicode inconsistent (I won't try to argue either way),
> but it's pretty clear that isspace is being consistent based on the data
> it has to work with.
>
Well, if you're going to start answering questions with FACTS, how can
questioners reply on their prejudices to guide them any more?
regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------
More information about the Python-list
mailing list