[Python-Dev] Should we move to replace re with regex?

Antoine Pitrou solipsis at pitrou.net
Sat Aug 27 12:09:29 CEST 2011


On Sat, 27 Aug 2011 09:18:14 +0200
"Martin v. Löwis" <martin at v.loewis.de> wrote:
> Am 27.08.2011 08:33, schrieb Terry Reedy:
> > On 8/26/2011 9:56 PM, Antoine Pitrou wrote:
> > 
> >> Another "interesting" question is whether it's easy to port to the PEP
> >> 393 string representation, if it gets accepted.
> > 
> > Will the re module need porting also?
> 
> That's a quality-of-implementation issue (in both cases). In principle,
> the modules should continue to work unmodified, and indeed SRE does.
> However, the module will then match on Py_UNICODE, which may be
> expensive to produce, and may not meet your expectations of surrogate
> pair handling.
> 
> So realistically, the module should be ported, which has the challenge
> that matching needs to operate on three different representations. The
> modules already support two representations (unsigned char and
> Py_UNICODE), but probably switching on type, not on state.

From what I've seen, re generates two different sets of functions at
compile-time (with a stringlib-like approach), while regex has a
run-time flag to choose between the two representations (where,
interestingly, the two code paths are explicitly spelled, almost
duplicate of each other).
Matthew, please correct me if I'm wrong.

Regards

Antoine.




More information about the Python-Dev mailing list