[Python-3000] Regular expressions, py3k and unicode

Mark Dickinson dickinsm at gmail.com
Sun Jun 29 13:05:27 CEST 2008


On Sat, Jun 28, 2008 at 9:45 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Wouldn't it be more natural that, at least when the pattern is a str object
> rather a bytes object, the re.UNICODE be implied by default?

Might this have some unintended consequences?  For example, one
would then get the following undesirable behaviour from the decimal
module, using inputs with Unicode fullwidth digits.

>>> Decimal('\uff11')
Decimal('1')
>>> Decimal('\uff11') == Decimal('1')
False

There are plenty of easy fixes for this, of course, but I don't know
how many other modules might be similarly affected.

In any case, it seems to me that having something like re.ASCII
would be useful.

Mark


More information about the Python-3000 mailing list