[Python-Dev] regex module

MRAB python at mrabarnett.plus.com
Wed Jan 13 01:22:01 CET 2010


Terry Reedy wrote:
> On 1/12/2010 5:10 PM, MRAB wrote:
>> Hi all,
>>
>> I'm back on the regex module after doing other things and I'd like your
>> opinion on a number of matters:
>>
>> Firstly, the current re module has a bug whereby it doesn't split on
>> zero-width matches. The BDFL has said that this behaviour should be
>> retained by default in case any existing software depends on it. My
>> question is: should my regex module still do this for Python 3?
>> Speaking personally, I'd like it to behave correctly, and Python 3 is
>> the version where backwards-compatibility is allowed to be broken.
> 
> Are you writing a new module with a new name? If so, do you expect it to 
> replace or augment re? (This is the same question as for optparse vs. 
> argparse, which I understand to not yet be decided.)
 >
It's a module called 'regex'. It can be used in place of 're' by using
"import regex as re", except for differences such as "\g<name>" being a
legal group reference in pattern strings.
>>
>> Secondly, Python 2 is reaching the end of the line and Python 3 is the
>> future. Should I still release a version that works with Python 2? I'm
>> thinking that it could be confusing if new regex module did zero-width
>> splits correctly in Python 3 but not in Python 2. And also, should I
>> release it only for Python 3 as a 'carrot'?
> 
> 2.7 is in alpha with no plans for 2.8, so unless you finish real soon, 
> 2.7 stdlib is already out. A new engine should get some community 
> testing before going in the stdlib. Even 3.2 beta is not that far off 
> (8-9 months?) Do *you* want to do the extra work for a 2.x release on PyPI?
> 
>> Finally, the module allows some extra backslash escapes, eg \g<name>, in
>> the pattern. Should it treat ill-formed escapes, eg \g, as it would have
>> treated them in the re module?
> 
> What does re do with analogous cases?
> 
The 're' module treats r"\g" as "g"; both 're' and 'regex' treat, say, 
r"\q" as "q". The closest analogue to what I'm asking about is that re
treats the ill-formed repeat r"x{1," as a literal, which sort of
suggests that r"\g" should be treated as "g", but r"\g<name>" is now a
group reference (re would treat that as "g<name>". Does that sound
reasonable?



More information about the Python-Dev mailing list