[issue7132] Regexp: capturing groups in repetitions

Philippe Verdy report at bugs.python.org
Thu Oct 15 02:13:59 CEST 2009


Philippe Verdy <verdy_p at wanadoo.fr> added the comment:

> a "general" regex (e.g. for an ipv6 address)

I know this problem, and I have already written about this. It is not 
possible to parse it in a single regexp if it is written without using 
repetitions. But in that case, the regexp becomes really HUGE, and the 
number of groups in the returned match object is prohibitive. That's why 
CPAN has had to write a specific module for IPv6 addresses in Perl.

Such module can be reduced to just a couple of lines with a single 
regexp, if its capturing groups correctly return ALL their occurences in 
the regexp engine: it requires no further processing and analysis, and 
the data can effectively be reassembled cleanly, just from the returned 
groups (as lists):
- \1 and \2 (for hex components of IPv6 in hex format only, where \1 can 
occur 0 or 1 time, and \2 can occur 0 to 7 times)
- or from \1 to \2 and \3 to \4 (for hex components in \1..\2, where \1 
occurs 0 or 1 time and \2 occurs 0 to 5 times, and for decimal 
components in \3..\4, where \3 occurs 1 time and \4 occurs exactly 3 
times).

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue7132>
_______________________________________


More information about the Python-bugs-list mailing list