How to split a string containing nested commas-separated substrings

Cédric Lucantis omer at no-log.org
Wed Jun 18 14:12:33 EDT 2008


Hi,

Le Wednesday 18 June 2008 19:19:57 Robert Dodier, vous avez écrit :
> Hello,
>
> I'd like to split a string by commas, but only at the "top level" so
> to speak. An element can be a comma-less substring, or a
> quoted string, or a substring which looks like a function call.
> If some element contains commas, I don't want to split it.
>
> Examples:
>
> 'foo, bar, baz' => 'foo' 'bar' 'baz'
> 'foo, "bar, baz", blurf' => 'foo' 'bar, baz' 'blurf'
> 'foo, bar(baz, blurf), mumble' => 'foo' 'bar(baz, blurf)' 'mumble'
>
> Can someone suggest a suitable regular expression or other
> method to split such strings?
>

I'd do something like this (note that it doesn't check for quote/parenthesis 
mismatch and removes _all_ the quotes) :

def mysplit (string) :
    pardepth = 0
    quote = False
    ret = ['']
    
    for car in string :
        
        if car == '(' : pardepth += 1
        elif car == ')' : pardepth -= 1
        elif car in ('"', "'") :
            quote = not quote
            car = '' # just if you don't want to keep the quotes
        
        if car in ', ' and not (pardepth or quote) :
            if ret[-1] != '' : ret.append('')
        else :
            ret[-1] += car
            
    return ret

# test
for s in ('foo, bar, baz',
          'foo, "bar, baz", blurf',
          'foo, bar(baz, blurf), mumble') :
    print "'%s' => '%s'" % (s, mysplit(s))

# result
'foo, bar, baz' => '['foo', 'bar', 'baz']'
'foo, "bar, baz", blurf' => '['foo', 'bar, baz', 'blurf']'
'foo, bar(baz, blurf), mumble' => '['foo', 'bar(baz, blurf)', 'mumble']'


-- 
Cédric Lucantis



More information about the Python-list mailing list