stupid perl question

John Machin sjmachin at lexicon.net
Fri May 26 21:11:40 EDT 2006


On 27/05/2006 9:51 AM, BJörn Lindqvist wrote:
>> how can i split a string that contains white spaces and '_'
>>
>> any clue?
> 
> If the white spaces and the '_' should be applied equivalently on the
> input and you can enumerate all white space characters, you could do
> like this:

Yes, you could write out the whitespace characters for the 8-bit 
encoding of your choice, or you could find them using Python (and get 
some possibly surprising answers):

 >>> mkws = lambda enc, sz=256: "".join([chr(i) for i in range(sz) if 
chr(i).decode(enc, 'ignore').isspace()])
 >>> mkws('cp1252')
'\t\n\x0b\x0c\r\x1c\x1d\x1e\x1f \xa0'
 >>> mkws('latin1')
'\t\n\x0b\x0c\r\x1c\x1d\x1e\x1f \x85\xa0'
 >>> mkws('cp1251')
'\t\n\x0b\x0c\r\x1c\x1d\x1e\x1f \xa0'
 >>> mkws('ascii', 128)
'\t\n\x0b\x0c\r\x1c\x1d\x1e\x1f '

and compare the last one with the result for the C locale:

 >>> "".join([chr(i) for i in range(256) if chr(i).isspace()])
'\t\n\x0b\x0c\r '

> 
> def split_helper(list, delims):
>    if not delims:
>        return list
>    ch = delims[0]
>    lst = []
>    for item in list:
>        lst += split_helper(item.split(ch), delims[1:])
>    return lst
> 
> def split(str, delims):
>    return split_helper([str], delims)
> 
>>>> split("foo_bar eh", "_ ")
> ['foo', 'bar', 'eh']
> 
> Though I bet someone will post a one-line solution in the next 30 
> minutes. :)

Two one-liners, depending on what the OP really wants:

 >>> re.split(r"[\s_]", "foo_bar      zot plugh _ xyzzy")
['foo', 'bar', '', '', '', '', '', 'zot', 'plugh', '', '', 'xyzzy']

which is what your ever-so-slightly-baroque effort does :-)
or

 >>> re.split(r"[\s_]+", "foo_bar      zot plugh _ xyzzy")
['foo', 'bar', 'zot', 'plugh', 'xyzzy']

Cheers,
John



More information about the Python-list mailing list