stupid perl question

John Machin sjmachin at lexicon.net
Tue May 30 19:58:52 EDT 2006


On 31/05/2006 5:55 AM, Jorgen Grahn wrote:
> On Sat, 27 May 2006 11:11:40 +1000, John Machin <sjmachin at lexicon.net> wrote:
> ...
>> Yes, you could write out the whitespace characters for the 8-bit 
>> encoding of your choice, or you could find them using Python (and get 
>> some possibly surprising answers):
>>
>>>>> mkws = lambda enc, sz=256: "".join([chr(i) for i in range(sz) if 
>> chr(i).decode(enc, 'ignore').isspace()])
> ...
>>>>> mkws('latin1')
>> '\t\n\x0b\x0c\r\x1c\x1d\x1e\x1f \x85\xa0'
>                                       ^^^^
> That surprised me, at least. Should NO-BREAK SPACE really count as
> whitespace?

NO-BREAK SPACE is a space. Of course it should return True when fed to 
isspace(). Whitespace is a silly term, anyway (IMHO); is there such a 
thing as a space that is not white?

> I thought that the whole purpose with it was to have a blank
> character which programs automatically treated as non-whitespace, for
> line-breaking, word-counting and similar purposes.

Yes, but the concept of things like split() splitting on ASCII 
"whitespace" evidently predated (or ignored!) the concept of a no-break 
space appearing in various word-processors. Automatically?? Sure it 
counts for line-breaking, but some applications might want to treat it 
as a word-separator. It pays to look at what's in one's data, and find 
out what the tools and functions are actually doing with it.

Cheers,
John



More information about the Python-list mailing list