split string at commas respecting quotes when string not in csv format

R. David Murray rdmurray at bitdance.com
Thu Mar 26 22:45:52 EDT 2009


John Machin <sjmachin at lexicon.net> wrote:
> On Mar 27, 6:51 am, "R. David Murray" <rdmur... at bitdance.com> wrote:
> > OK, I've got a little problem that I'd like to ask the assembled minds
> > for help with.  I can write code to parse this, but I'm thinking it may
> > be possible to do it with regexes.  My regex foo isn't that good, so if
> > anyone is willing to help (or offer an alternate parsing suggestion)
> > I would be greatful.  (This has to be stdlib only, by the way, I
> > can't introduce any new modules into the application so pyparsing is
> > not an option.)
> >
> > The challenge is to turn a string like this:
> >
> >     a=1,b="0234,)#($)@", k="7"
> >
> > into this:
> >
> >     [("a", "1"), ("b", "0234,)#($)#"), ("k", "7")]
> 
> The challenge is for you to explain unambiguously what you want.
> 
> 1. a=1 => "1" and k="7" => "7" ... is this a mistake or are the quotes
> optional in the original string when not required to protect a comma?

optional.

> 2. What is the rule that explains the transmogrification of @ to # in
> your example?

Now that's a mistake :)

> 3. Is the input guaranteed to be syntactically correct?

If it's not, it's the customer that gets to deal with the error.

> The following should do close enough to what you want; adjust as
> appropriate.
> 
>  >>> import re
>  >>> s = """a=1,b="0234,)#($)@", k="7" """
>  >>> rx = re.compile(r'[ ]*(\w+)=([^",]+|"[^"]*")[ ]*(?:,|$)')
>  >>> rx.findall(s)
>  [('a', '1'), ('b', '"0234,)#($)@"'), ('k', '"7"')]
>  >>> rx.findall('a=1, *DODGY*SYNTAX* b=2')
>  [('a', '1'), ('b', '2')]
>  >>>

I'm going to save this one and study it, too.  I'd like to learn
to use regexes better, even if I do try to avoid them when possible :)

--
R. David Murray             http://www.bitdance.com




More information about the Python-list mailing list