Trouble splitting strings with consecutive delimiters

Jussi Piitulainen jpiitula at ling.helsinki.fi
Tue May 1 02:14:54 EDT 2012


deuteros writes:

> I'm using regular expressions to split a string using multiple
> delimiters.  But if two or more of my delimiters occur next to each
> other in the string, it puts an empty string in the resulting
> list. For example:
> 
>     	re.split(':|;|px', "width:150px;height:50px;float:right")
> 
> Results in
> 
>     	['width', '150', '', 'height', '50', '', 'float', 'right']
> 
> Is there any way to avoid getting '' in my list without adding px;
> as a delimiter?

You could use a sequence of such delimiters.

>>> re.split('(?::|;|px)+', "width:150px;height:50px;float:right")
['width', '150', 'height', '50', 'float', 'right']

Consider splitting twice instead: first into key-value substrings at
semicolons, and those into key-value pairs at colons. Here as a dict.
Better handle the units after that.

>>> dict(kv.split(':') for kv in "width:150px;height:50px;float:right".split(';'))
{'width': '150px', 'float': 'right', 'height': '50px'}

You might also want to accept whitespace as part of the delimiters.

(There might be a parser for such data formats somewhere in the
library already. CSV?)



More information about the Python-list mailing list