newby question: Splitting a string - separator

Fri Dec 9 23:43:17 EST 2005

bonono at gmail.com wrote:
> Thomas Liesner wrote:
>> Hi all,
>>
>> i am having a textfile which contains a single string with names.
>> I want to split this string into its records an put them into a list.
>> In "normal" cases i would do something like:
>>
>>> #!/usr/bin/python
>>> inp = open("file")
>>> data = inp.read()
>>> names = data.split()
>>> inp.close()
>> The problem is, that the names contain spaces an the records are also
>> just seprarated by spaces. The only thing i can rely on, ist that the
>> recordseparator is always more than a single whitespace.
>>
>> I thought of something like defining the separator for split() by using
>>  a regex for "more than one whitespace". RegEx for whitespace is \s, but
>> what would i use for "more than one"? \s+?
>>
> Can I just use "two space" as the seperator ?
> 
> [ x.strip() for x in data.split("  ") ]
> 
If you like, but it will create dummy entries if there are more than two spaces:

  >>> data = "Guido van Rossum  Tim Peters    Thomas Liesner"
  >>> [ x.strip() for x in data.split("  ") ]
  ['Guido van Rossum', 'Tim Peters', '', 'Thomas Liesner']

You could add a condition to the listcomp:

  >>> [name.strip() for name in data.split("  ") if name]
  ['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']

but what if there is some other whitespace character?

  >>> data = "Guido van Rossum  Tim Peters  \t  Thomas Liesner"
  >>> [name.strip() for name in data.split("  ") if name]
  ['Guido van Rossum', 'Tim Peters', '', 'Thomas Liesner']
  >>>

perhaps a smarter condition?

  >>> [name.strip() for name in data.split("  ") if name.strip(" \t")]
  ['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']

but this is beginning to feel like hard work.

I think this is a case where it's not worth the effort to try to avoid the regexp

  >>> import re
  >>> re.split("\s{2,}",data)
  ['Guido van Rossum', 'Tim Peters', 'Thomas Liesner']
  >>>

Michael