newby question: Splitting a string - separator
Michael Spencer
mahs at telcopartners.com
Fri Dec 9 23:43:17 EST 2005
bonono at gmail.com wrote:
> Thomas Liesner wrote:
>> Hi all,
>>
>> i am having a textfile which contains a single string with names.
>> I want to split this string into its records an put them into a list.
>> In "normal" cases i would do something like:
>>
>>> #!/usr/bin/python
>>> inp = open("file")
>>> data = inp.read()
>>> names = data.split()
>>> inp.close()
>> The problem is, that the names contain spaces an the records are also
>> just seprarated by spaces. The only thing i can rely on, ist that the
>> recordseparator is always more than a single whitespace.
>>
>> I thought of something like defining the separator for split() by using
>> a regex for "more than one whitespace". RegEx for whitespace is \s, but
>> what would i use for "more than one"? \s+?
>>
> Can I just use "two space" as the seperator ?
>
> [ x.strip() for x in data.split(" ") ]
>
If you like, but it will create dummy entries if there are more than two spaces:
>>> data = "Guido van Rossum Tim Peters Thomas Liesner"
>>> [ x.strip() for x in data.split(" ") ]
['Guido van Rossum', 'Tim Peters', '', 'Thomas Liesner']
You could add a condition to the listcomp:
>>> [name.strip() for name in data.split(" ") if name]
['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']
but what if there is some other whitespace character?
>>> data = "Guido van Rossum Tim Peters \t Thomas Liesner"
>>> [name.strip() for name in data.split(" ") if name]
['Guido van Rossum', 'Tim Peters', '', 'Thomas Liesner']
>>>
perhaps a smarter condition?
>>> [name.strip() for name in data.split(" ") if name.strip(" \t")]
['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']
but this is beginning to feel like hard work.
I think this is a case where it's not worth the effort to try to avoid the regexp
>>> import re
>>> re.split("\s{2,}",data)
['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']
>>>
Michael
More information about the Python-list
mailing list