split string at commas respecting quotes when string not in csv format

Tim Chase python.list at tim.thechases.com
Thu Mar 26 16:30:18 EDT 2009


> The challenge is to turn a string like this:
> 
>     a=1,b="0234,)#($)@", k="7"
> 
> into this:
> 
>     [("a", "1"), ("b", "0234,)#($)#"), ("k", "7")]

A couple solutions "work" for various pathological cases of input 
data:

   import re
   s = 'a=1,b="0234,)#($)@", k="7"'
   r = re.compile(r"""
     (?P<varname>\w+)
     \s*=\s*(?:
     "(?P<quoted>[^"]*)"
     |
     (?P<unquoted>[^,]+)
     )
     """, re.VERBOSE)
   results = [
     (m.group('varname'),
       m.group('quoted') or
       m.group('unquoted')
     )
     for m in r.finditer(s)
     ]

############### or ##############################

   r = re.compile(r"""
     (\w+)
     \s*=\s*(
     "(?:[^"]*)"
     |
     [^,]+
     )
     """, re.VERBOSE)
   results = [
     (m.group(1), m.group(2).strip('"'))
     for m in r.finditer(s)
     ]

Things like internal quoting ('b="123\"456", c="123""456"') would 
require a slightly smarter parser.

-tkc







More information about the Python-list mailing list