[Python-Dev] SRE incompatibility

Fredrik Lundh Fredrik Lundh" <effbot@telia.com
Fri, 30 Jun 2000 19:53:45 +0200


tim wrote:
> > That doesn't help with regexes, of course, since a pattern might be
> > written as a regular string but be intended to match Unicode.  Maybe
> > the simplest rule is the best; always take 4 digits, even if it =
winds
> > up being incompatible with the \x in string literals.
>=20
> I vote for backward compatibility for now, and not only because that =
will
> irritate /F the most.

backward compatibility with what?  8-bit string literals or unicode
string literals?

the problem here is that the pattern is compiled once (from either
8-bit or unicode strings), and can then be used on either 8-bit or
unicode targets.  to be fully backwards compatible, this means that
the compiler should use 8 bits, no matter what string type you're
using.

another solution would be to use the type of the pattern string to
choose between 8 and 16 bits.  I almost implemented that, before
I realized that it broke the following rather nice property:

    sre.compile("some pattern") =3D=3D sre.compile(u"some pattern")

(well, the pattern type doesn't implement __cmp__, but you get the
idea).  the current implementation guarantees "=3D=3D", but I'm planning
to change that to "is" (!).

anyway, I suspect it's too late to change this in 2.0b1.  if enough
people complain about this, we can always label it a "critical bug",
and do something about it in b2.

</F>