re.sub and empty groups

harvey.thomas at informa.com harvey.thomas at informa.com
Tue Jan 16 08:55:56 EST 2007


Hugo Ferreira wrote:

> Hi!
>
> I'm trying to do a search-replace in places where some groups are
> optional... Here's an example:
>
> >> re.match(r"Image:([^\|]+)(?:\|(.*))?", "Image:ola").groups()
> ('ola', None)
>
> >> re.match(r"Image:([^\|]+)(?:\|(.*))?", "Image:ola|").groups()
> ('ola', '')
>
> >> re.match(r"Image:([^\|]+)(?:\|(.*))?", "Image:ola|ole").groups()
> ('ola', 'ole')
>
> The second and third results are right, but not the first one, where
> it should be equal to the second (i.e., it should be an empty string
> instead of None). This is because I want to use re.sub() and when the
> group is None, it blows up with a stack trace...
>
> Maybe I'm not getting the essence of groups and non-grouping groups.
> Someone care to explain (and, give the correct solution :)) ?
>
> Thanks in advance,
>
> Hugo Ferreira
>
> --
> GPG Fingerprint: B0D7 1249 447D F5BB 22C5  5B9B 078C 2615 504B 7B85

>From the documentation:
groups( [default])
Return a tuple containing all the subgroups of the match, from 1 up to
however many groups are in the pattern. The default argument is used
for groups that did not participate in the match; it defaults to None.

Your second group is optional and does not take part in the match in
your first example. You can, however, still use this regular expression
if you use groups('') rather than groups().

A better way probably is to use a simplified regular expression

re.match(r"Image:([^\|]+)\|?(.*)", "Image:ola").groups()

i.e. match the text "Image:" followed by at least one character not
matching "|" followed by an optional "|" followed by any remaining
characters.




More information about the Python-list mailing list