split string at commas respecting quotes when string not in csv format
Tim Chase
python.list at tim.thechases.com
Fri Mar 27 10:19:29 EDT 2009
Paul McGuire wrote:
> On Mar 27, 5:19 am, Tim Chase <python.l... at tim.thechases.com> wrote:
>>>> >>> import re
>>>> >>> s = """a=1,b="0234,)#($)@", k="7" """
>>>> >>> rx = re.compile(r'[ ]*(\w+)=([^",]+|"[^"]*")[ ]*(?:,|$)')
>>>> >>> rx.findall(s)
>>>> [('a', '1'), ('b', '"0234,)#($)@"'), ('k', '"7"')]
>>>> >>> rx.findall('a=1, *DODGY*SYNTAX* b=2')
>>>> [('a', '1'), ('b', '2')]
>>> I'm going to save this one and study it, too. I'd like to learn
>>> to use regexes better, even if I do try to avoid them when possible :)
>> This regexp is fairly close to the one I used, but I employed the
>> re.VERBOSE flag to split it out for readability. The above
>> breaks down as
>>
>> [ ]* # optional whitespace, traditionally "\s*"
>> (\w+) # tag the variable name as one or more "word" chars
>> = # the literal equals sign
>> ( # tag the value
>> [^",]+ # one or more non-[quote/comma] chars
>> | # or
>> "[^"]*" # quotes around a bunch of non-quote chars
>> ) # end of the value being tagged
>> [ ]* # same as previously, optional whitespace ("\s*")
>> (?: # a non-capturing group (why?)
>> , # a literal comma
>> | # or
>> $ # the end-of-line/string
>> ) # end of the non-capturing group
>
> Mightent there be whitespace on either side of the '=' sign? And if
> you are using findall, why is the bit with the delimiting commas or
> end of line/string necessary? I should think findall would just skip
> over this stuff, like it skips over *DODGY*SYNTAX* in your example.
Which would leave you with the solution(s) fairly close to what I
original posited ;-)
(my comment about the "non-capturing group (why?)" was in
relation to not needing to find the EOL/comma because findall()
doesn't need it, as Paul points out, not the precedence of the
"|" operator.)
-tkc
More information about the Python-list
mailing list