Regular expression query

Tim Chase python.list at tim.thechases.com
Sun Mar 12 14:51:46 EDT 2017


On 2017-03-12 09:22, rahulrasal at gmail.com wrote:
> aaaaa,bbbbb,ccccc "4873898374", ddddd, eeeeee "3343,23,23,5,,5,45",
> fffff "5546,3434,345,34,34,5,34,543,7"
> 
> It is comma saperated string, but some of the fields have a double
> quoted string as part of it (and that double quoted string can have
> commas). Above string have only 6 fields. First is aaaaa, second is
> bbbbb and last is fffff "5546,3434,345,34,34,5,34,543,7". How can I
> split this string in its fields using regular expression ? or even
> if there is any other way to do this, please speak out.

Your desired output seems to silently ignore the spaces after the
commas (e.g. why is it "ddddd" instead of " ddddd"?).  You also don't
mention what should happen in the event there's an empty field:

   aaa,,ccc,ddd "ee",ff

For a close approximation, you might try

import re
instr = 'aaaaa,bbbbb,ccccc "4873898374", ddddd, eeeeee "3343,23,23,5,,5,45", fffff "5546,3434,345,34,34,5,34,543,7"'

desired = [
    "aaaaa",
    "bbbbb",
    'ccccc "4873898374"',
    "ddddd",
    'eeeeee "3343,23,23,5,,5,45"',
    'fffff "5546,3434,345,34,34,5,34,543,7"',
    ]

r = re.compile(r'(?!,|$)(?:"[^"]*"|[^,])*')
# result = r.findall(instr)
# strip them because of the aforementioned leading-space issue
result = [s.strip() for s in r.findall(instr)]
assert len(result) == len(desired), str(result)
assert result == desired, str(result)



It doesn't address the empty field issue, but it's at least a start.

-tkc







More information about the Python-list mailing list