split string at commas respecting quotes when string not in csv format

John Machin sjmachin at lexicon.net
Thu Mar 26 18:20:57 EDT 2009


On Mar 27, 8:43 am, Terry Reedy <tjre... at udel.edu> wrote:
> R. David Murray wrote:
> > OK, I've got a little problem that I'd like to ask the assembled minds
> > for help with.  I can write code to parse this, but I'm thinking it may
> > be possible to do it with regexes.  My regex foo isn't that good, so if
> > anyone is willing to help (or offer an alternate parsing suggestion)
> > I would be greatful.  (This has to be stdlib only, by the way, I
> > can't introduce any new modules into the application so pyparsing is
> > not an option.)
>
> > The challenge is to turn a string like this:
>
> >     a=1,b="0234,)#($)@", k="7"
>
> > into this:
>
> >     [("a", "1"), ("b", "0234,)#($)#"), ("k", "7")]
>
> But the starting string IS is csv format, where the values are strings
> with the format name=string.
>
>  >>> import csv
>  >>> myDialect = csv.excel
>  >>> myDialect.skipinitialspace = True # needed for space before 'k'
>  >>> a=list(csv.reader(['''a=1,b="0234,)#($)@", k="7"'''], myDialect))[0]
>  >>> a
> ['a=1', 'b="0234', ')#($)@"', 'k="7"']
>  >>> b=[tuple(s.split('=',1)) for s in a]
>  >>> b
> [('a', '1'), ('b', '"0234'), (')#($)@"',), ('k', '"7"')]
>

It's in the csv format that Excel accepts on input but this is
irrelevant. The output does not meet the OP's requirements; it has
taken the should-have-been-protected comma as a delimiter, and
produced FOUR elements instead of THREE ... also note '"0234' has a
leading " and ')#($)@"' has a trailing "




More information about the Python-list mailing list