help with simple regular expression grouping with re

Tim Peters tim_one at email.msn.com
Sat May 8 01:05:50 EDT 1999


[Bob Horvath]
> Being relatively new to Python, I am trying to do something using re and
> cannot figure out the right pattern to do what I want.

That's OK -- regular expressions are tricky!  Be sure to read

     http://www.python.org/doc/howto/regex/regex.html

for a gentler intro than the reference manual has time to give.

> The input that I am parsing is a typical "mail merge" file, containing
> comma separated fields that are surrounded by double quotes.  A typical
> line is:
>
> "field 1", "field 2","field 3 has is different, it has an embedded
> comma","this one doesn't"
>
> I am trying to get a list of fields that are the strings that are
> between the quotes, including any embedded commas.

Note that regexps are utterly unforgiving -- the first two fields in your
example aren't separated by a comma, but by a comma followed by a blank.  I
don't know whether that was a typo or a requirement, so let's write
something that doesn't care <wink>:

import re
pattern = re.compile(r"""
    "           # match an open quote
    (           # start a group so re.findall returns only this part
        [^"]*?  # match shortest run of non-quote characters
    )           # close the group
    "           # and match the close quote
""", re.VERBOSE)

answer = re.findall(pattern, your_example)
for field in answer:
    print field

That prints:

field 1
field 2
field 3 has is different, it has an embedded comma
this one doesn't

Just study that until your eyes bleed <wink>.

defender-of-python-and-corrupter-of-youth-ly y'rs  - tim






More information about the Python-list mailing list