Using re - side effects or misunderstanding

Andrew Henshaw andrew_dot_henshaw_at_earthling_dot_net
Sun Jan 14 13:58:32 EST 2001


This is all part of my question about side effects.  As soon as I introduce
grouping, then the behavior of findall changes dramatically.  With findall,
using groups for grammatical purposes has the side-effect of dramatically
changing the return type.  I have a class that uses passed in re's. Internal
to the class, I'm using findall against those re's.  Because theuser may
want to use groups for grammatical purposes only (he doesn't care about what
it does to the findall command), then I have to program in a manner that
reduces performance.  I have to either apply string.join to all  return
values (which seems pretty extravagant when the return value is already a
string), or check the return type.  I was asking the question to see if I
was missing something, but apparently I wasn't.

Would not an acceptable solution be to define a new re compile flag that
indicates that grouping only serves a grammatical function?

AH


"Sune Kirkeby" <sune at interspace.dk> wrote in message
news:87y9wel7fm.fsf at sune.interspace.dk...
>
> [ "Andrew Henshaw" <andrew_dot_henshaw_at_earthling_dot_net> ]
> > But I thought that ?: is for matching but not returning.  And so it is,
> > under certain circumstances, e.g.
>
> It always is, and it always does.
>
> > if my pattern is
> >     '(ab)(c)xyz'
> > I get
> >     [('ab', 'c')]   (Yikes! a tuple. I'm going to have to change my code
a
> > bit to handle this)
> > but
> >     '(ab)(?:c)xyz'
> > yields,
> >     ['ab']
> > and
> >     '(?:ab)(?:c)xyz'
> > gives
> >     ['abcxyz']
>
> All of the above are what one would expect, since re.findall returns
> a list of matches, if there are no groups in the re.  But if there are
> groups it will _only_ return a list of tuples with the matched groups.
>
> Note that (?:...) is non-grouping, so in '(?:ab)(?:c)xyz' there are
> no groups, but in '(ab)(?:c)xyz' there is one group, which will then
> be returned.
>
> > so how do I get the result
> >     ['abxyz']
> > ??
>
> Something along the lines of,
>
> >>> r = re.compile('(ab)c(xyz)')
> >>> matches = r.findall('..abcxyz..')
> >>> matches
> [('ab', 'xyz')]
>
> almost there, just have to join the tuples,
>
> >>> map(lambda l: string.join(l, ''), matches)
> ['abxyz']
>
> Voila!
>
> > In other words, adding groups for the purpose of adding repetitions
seems to
> > have a greater side-effect than I would desire.  Is there something that
I'm
> > missing in my use of re's?
>
> Yes, you weren't using groups (all the time, anyway)  :-).
>
> first-posting-ly yrs'
>
> --
> Sune Kirkeby                    | "Imagine, if you will, that there were
no
> http://mel.interspace.dk/~sune/ | such thing as a hypothetical
situation..."





More information about the Python-list mailing list