[issue2636] Adding a new regex module (compatible with re)

Matthew Barnett report at bugs.python.org
Thu Sep 1 19:50:50 CEST 2011


Matthew Barnett <python at mrabarnett.plus.com> added the comment:

The regex module supports nested sets and set operations, eg. r"[[a-z]--[aeiou]]" (the letters from 'a' to 'z', except the vowels). This means that literal '[' in a set needs to be escaped.

For example, re module sees "[][()]..." as:

    [      start of set
     ]     literal ']'
     [()   literals '[', '(', ')'
    ]      end of set
    ...   ...

but the regex module sees it as:

    [      start of set
     ]     literal ']'
     [()]  nested set [()]
     ...   ...

Thus:

>>> s = u'void foo ( type arg1 [, type arg2 ] )'
>>> regex.sub(r'(?<=[][()]) |(?!,) (?!\[,)(?=[][(),])', '', s)
u'void foo ( type arg1 [, type arg2 ] )'
>>> regex.sub('(?<=[]\[()]) |(?!,) (?!\[,)(?=[]\[(),])', '', s)
u'void foo(type arg1 [, type arg2])'

If it can't parse it as a nested set, it tries again as a non-nested set (like re), but there are bound to be regexes where it could be either.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue2636>
_______________________________________


More information about the Python-bugs-list mailing list