newby question: Splitting a string - separator

Steven D'Aprano steve at REMOVETHIScyber.com.au
Sat Dec 10 00:46:52 EST 2005


On Fri, 09 Dec 2005 18:02:02 -0800, James Stroud wrote:

> Thomas Liesner wrote:
>> Hi all,
>> 
>> i am having a textfile which contains a single string with names.
>> I want to split this string into its records an put them into a list.
>> In "normal" cases i would do something like:
>> 
>> 
>>>#!/usr/bin/python
>>>inp = open("file")
>>>data = inp.read()
>>>names = data.split()
>>>inp.close()
>> 
>> 
>> The problem is, that the names contain spaces an the records are also
>> just seprarated by spaces. The only thing i can rely on, ist that the
>> recordseparator is always more than a single whitespace.
>> 
>> I thought of something like defining the separator for split() by using
>>  a regex for "more than one whitespace". RegEx for whitespace is \s, but
>> what would i use for "more than one"? \s+?
>> 
>> TIA,
>> Tom
> 
> The one I like best goes like this:
> 
> py> data = "Guido van Rossum  Tim Peters     Thomas Liesner"
> py> names = [n for n in data.split() if n]
> py> names
> ['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']
> 
> I think it is theoretically faster (and more pythonic) than using regexes.


Yes, but the correct result would be:

['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']

Your code is short, elegant but wrong.

It could also be shorter and more elegant:

# your version
py> data = "Guido van Rossum  Tim Peters     Thomas Liesner"
py> [n for n in data.split() if n]
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

# my version
py> data = "Guido van Rossum  Tim Peters     Thomas Liesner"
py> data.split()
['Guido', 'van', 'Rossum', 'Tim', 'Peters', 'Thomas', 'Liesner']

The "if n" in the list comp is superfluous, and without that, the whole
list comp is unnecessary.



-- 
Steven.




More information about the Python-list mailing list