Groups in regular expressions don't repeat as expected

Neil Cerutti neilc at norwich.edu
Thu Apr 21 09:16:36 EDT 2011


On 2011-04-20, John Nagle <nagle at animats.com> wrote:
>      Findall does something a bit different. It returns a list of
> matches of the entire pattern, not repeats of groups within
> the pattern.
>
>      Consider a regular expression for matching domain names:
>
> >>> kre = re.compile(r'^([a-zA-Z0-9\-]+)(?:\.([a-zA-Z0-9\-]+))+$')
> >>> s = 'www.example.com'
> >>> ms = kre.match(s)
> >>> ms.groups()
> ('www', 'com')
> >>> msall = kre.findall(s)
> >>> msall
> [('www', 'com')]
>
> This is just a simple example.  But it illustrates an unnecessary
> limitation.  The matcher can do the repeated matching; you just can't
> get the results out.

Thanks for the further explantion.

Assuming a fake API that returned multiple group matches as a
tuple:

>>? print(re.match(r"^([a-z])+$", "abcdef").groups())
(('a', 'b', 'c', 'd', 'e', 'f'),)

I was thinking of applying findall something like this, but you
have to make multiple calls:

>>> m = re.match(r"^[a-z]+$", s)
>>> if m:
...   print(re.findall(r"[a-z]", m.group()))
...
['a', 'b', 'c', 'd', 'e', 'f']

I can see that getting really annoying. Is there a better way to
make multiple group matches accessible without adding a third
element type as a group element?

-- 
Neil Cerutti



More information about the Python-list mailing list