How to split a string containing nested commas-separated substrings

Matimus mccredie at gmail.com
Wed Jun 18 13:54:29 EDT 2008


On Jun 18, 10:19 am, Robert Dodier <robert.dod... at gmail.com> wrote:
> Hello,
>
> I'd like to split a string by commas, but only at the "top level" so
> to speak. An element can be a comma-less substring, or a
> quoted string, or a substring which looks like a function call.
> If some element contains commas, I don't want to split it.
>
> Examples:
>
> 'foo, bar, baz' => 'foo' 'bar' 'baz'
> 'foo, "bar, baz", blurf' => 'foo' 'bar, baz' 'blurf'
> 'foo, bar(baz, blurf), mumble' => 'foo' 'bar(baz, blurf)' 'mumble'
>
> Can someone suggest a suitable regular expression or other
> method to split such strings?
>
> Thank you very much for your help.
>
> Robert

You might look at the shlex module. It doesn't get you 100%, but its
close:

>>> shlex.split('foo, bar, baz')
['foo,', 'bar,', 'baz']
>>> shlex.split( 'foo, "bar, baz", blurf')
['foo,', 'bar, baz,', 'blurf']
>>> shlex.split('foo, bar(baz, blurf), mumble')
['foo,', 'bar(baz,', 'blurf),', 'mumble']

Using a RE will be tricky, especially if it is possible to have
recursive nesting (which by definition REs can't handle). For a real
general purpose solution you will need to create a custom parser.
There are a couple modules out there that can help you with that.

pyparsing is one: http://pyparsing.wikispaces.com/

Matt



More information about the Python-list mailing list