Split on multiple delimiters, and also treat consecutive delimiters as a single delimiter?

Victor Hooi victorhooi at gmail.com
Tue Jul 28 09:55:08 EDT 2015


I have a line that looks like this:

    14     *0    330     *0     760   411|0       0   770g  1544g   117g   1414 computedshopcartdb:103.5%          0      30|0     0|1    19m    97m  1538 ComputedCartRS  PRI   09:40:26

I'd like to split this line on multiple separators - in this case, consecutive whitespace, as well as the pipe symbol (|).

If I run .split() on the line, it will split on consecutive whitespace:

In [17]: f.split()
Out[17]:
['14',
 '*0',
 '330',
 '*0',
 '760',
 '411|0',
 '0',
 '770g',
 '1544g',
 '117g',
 '1414',
 'computedshopcartdb:103.5%',
 '0',
 '30|0',
 '0|1',
 '19m',
 '97m',
 '1538',
 'ComputedCartRS',
 'PRI',
 '09:40:26']

If I try to run .split(' |'), however, I get:

f.split(' |')
Out[18]: ['    14     *0    330     *0     760   411|0       0   770g  1544g   117g   1414 computedshopcartdb:103.5%          0      30|0     0|1    19m    97m  1538 ComputedCartRS  PRI   09:40:26']

I know the regex library also has a split, unfortunately, that does not collapse consecutive whitespace:

In [19]: re.split(' |', f)
Out[19]:
['',
 '',
 '',
 '',
 '14',
 '',
 '',
 '',
 '',
 '*0',
 '',
 '',
 '',
 '330',
 '',
 '',
 '',
 '',
 '*0',
 '',
 '',
 '',
 '',
 '760',
 '',
 '',
 '411|0',
 '',
 '',
 '',
 '',
 '',
 '',
 '0',
 '',
 '',
 '770g',
 '',
 '1544g',
 '',
 '',
 '117g',
 '',
 '',
 '1414',
 'computedshopcartdb:103.5%',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '0',
 '',
 '',
 '',
 '',
 '',
 '30|0',
 '',
 '',
 '',
 '',
 '0|1',
 '',
 '',
 '',
 '19m',
 '',
 '',
 '',
 '97m',
 '',
 '1538',
 'ComputedCartRS',
 '',
 'PRI',
 '',
 '',
 '09:40:26']

Is there an easy way to split on multiple characters, and also treat consecutive delimiters as a single delimiter?



More information about the Python-list mailing list