help with simple regular expression grouping with re

Bob Horvath bob at horvath.com
Sun May 9 00:21:12 EDT 1999


Tim Peters wrote:

> [Bob Horvath]
> > Being relatively new to Python, I am trying to do something using re and
> > cannot figure out the right pattern to do what I want.
>
> That's OK -- regular expressions are tricky!  Be sure to read
>
>      http://www.python.org/doc/howto/regex/regex.html
>
> for a gentler intro than the reference manual has time to give.
>

Thanks, it is a little easier.

>
> > The input that I am parsing is a typical "mail merge" file, containing
> > comma separated fields that are surrounded by double quotes.  A typical
> > line is:
> >
> > "field 1", "field 2","field 3 has is different, it has an embedded
> > comma","this one doesn't"
> >
> > I am trying to get a list of fields that are the strings that are
> > between the quotes, including any embedded commas.
>
> Note that regexps are utterly unforgiving -- the first two fields in your
> example aren't separated by a comma, but by a comma followed by a blank.  I
> don't know whether that was a typo or a requirement, so let's write
> something that doesn't care <wink>:

It was a typo.  The commas do not have blanks around them when separating
fields.  Nor are there any blanks or other white space at outside of the
double quoted fields.

>
>
> import re
> pattern = re.compile(r"""
>     "           # match an open quote
>     (           # start a group so re.findall returns only this part
>         [^"]*?  # match shortest run of non-quote characters
>     )           # close the group
>     "           # and match the close quote
> """, re.VERBOSE)
>
> answer = re.findall(pattern, your_example)
> for field in answer:
>     print field
>
> That prints:
>
> field 1
> field 2
> field 3 has is different, it has an embedded comma
> this one doesn't
>
> Just study that until your eyes bleed <wink>.
>

Well, I did a lot of searching around before and after my original post, and
while findall seems to be the thing I want, I am using 1.5.1, which apparently
does not have it.  I can upgrade my Linux system, but the system where it will
ultimately run might be a different story.

Is there a way to do the equivalent of findall on releases prior to having it?

Downloading-a-new-version-now-to-see-if-there-is-a-re.findall.py,
Bob





More information about the Python-list mailing list