help with simple regular expression grouping with re
Tim Peters
tim_one at email.msn.com
Sat May 8 01:05:50 EDT 1999
[Bob Horvath]
> Being relatively new to Python, I am trying to do something using re and
> cannot figure out the right pattern to do what I want.
That's OK -- regular expressions are tricky! Be sure to read
http://www.python.org/doc/howto/regex/regex.html
for a gentler intro than the reference manual has time to give.
> The input that I am parsing is a typical "mail merge" file, containing
> comma separated fields that are surrounded by double quotes. A typical
> line is:
>
> "field 1", "field 2","field 3 has is different, it has an embedded
> comma","this one doesn't"
>
> I am trying to get a list of fields that are the strings that are
> between the quotes, including any embedded commas.
Note that regexps are utterly unforgiving -- the first two fields in your
example aren't separated by a comma, but by a comma followed by a blank. I
don't know whether that was a typo or a requirement, so let's write
something that doesn't care <wink>:
import re
pattern = re.compile(r"""
" # match an open quote
( # start a group so re.findall returns only this part
[^"]*? # match shortest run of non-quote characters
) # close the group
" # and match the close quote
""", re.VERBOSE)
answer = re.findall(pattern, your_example)
for field in answer:
print field
That prints:
field 1
field 2
field 3 has is different, it has an embedded comma
this one doesn't
Just study that until your eyes bleed <wink>.
defender-of-python-and-corrupter-of-youth-ly y'rs - tim
More information about the Python-list
mailing list