How to split a string containing nested commas-separated substrings

Matimus mccredie at gmail.com
Wed Jun 18 16:55:07 EDT 2008


On Jun 18, 10:54 am, Matimus <mccre... at gmail.com> wrote:
> On Jun 18, 10:19 am, Robert Dodier <robert.dod... at gmail.com> wrote:
>
>
>
> > Hello,
>
> > I'd like to split a string by commas, but only at the "top level" so
> > to speak. An element can be a comma-less substring, or a
> > quoted string, or a substring which looks like a function call.
> > If some element contains commas, I don't want to split it.
>
> > Examples:
>
> > 'foo, bar, baz' => 'foo' 'bar' 'baz'
> > 'foo, "bar, baz", blurf' => 'foo' 'bar, baz' 'blurf'
> > 'foo, bar(baz, blurf), mumble' => 'foo' 'bar(baz, blurf)' 'mumble'
>
> > Can someone suggest a suitable regular expression or other
> > method to split such strings?
>
> > Thank you very much for your help.
>
> > Robert
>
> You might look at the shlex module. It doesn't get you 100%, but its
> close:
>
> >>> shlex.split('foo, bar, baz')
>
> ['foo,', 'bar,', 'baz']>>> shlex.split( 'foo, "bar, baz", blurf')
>
> ['foo,', 'bar, baz,', 'blurf']>>> shlex.split('foo, bar(baz, blurf), mumble')
>
> ['foo,', 'bar(baz,', 'blurf),', 'mumble']
>
> Using a RE will be tricky, especially if it is possible to have
> recursive nesting (which by definition REs can't handle). For a real
> general purpose solution you will need to create a custom parser.
> There are a couple modules out there that can help you with that.
>
> pyparsing is one:http://pyparsing.wikispaces.com/
>
> Matt

Following up to my own post, Here is a working example that uses the
built-in _ast module. I posted something similar the other day. This
uses pythons own internal parser to do it for you. It works in this
case because, at least from what you have posted, your syntax doesn't
violate python syntax.

[code]
import _ast

def eval_tuple(text):
    """ Evaluate a string representing a tuple of strings, names and
calls,
    returns a tuple of strings.
    """

    ast = compile(text, "<string>", 'eval', _ast.PyCF_ONLY_AST)
    return _traverse(ast.body)

def _traverse(ast):
    """ Traverse the AST returning string representations of tuples
strings
    names and calls.
    """
    if isinstance(ast, _ast.Tuple):
        return tuple(_traverse(el) for el in ast.elts)
    elif isinstance(ast, _ast.Str):
        return ast.s
    elif isinstance(ast, _ast.Name):
        return ast.id
    elif isinstance(ast, _ast.Call):
        name = ast.func.id
        args = [_traverse(x) for x in ast.args]
        return "%s(%s)"%(name, ", ".join(args))
    raise SyntaxError()

examples = [
    ('foo, bar, baz', ('foo', 'bar', 'baz')),
    ('foo, "bar, baz", blurf', ('foo', 'bar, baz', 'blurf')),
    ('foo, bar(baz, blurf), mumble', ('foo', 'bar(baz, blurf)',
'mumble')),
    ]

def test():
    for text, expected in examples:
        print "trying %r => %r"%(text, expected)
        result = eval_tuple(text)
        if result == expected:
            print "PASS"
        else:
            print "FAIL, GOT: %r"%result

if __name__ == "__main__":
    test()
[/code]

Matt



More information about the Python-list mailing list