[New-bugs-announce] [issue45869] Unicode and acii regular expressions do not agree on ascii space characters

Joran van Apeldoorn report at bugs.python.org
Mon Nov 22 08:27:59 EST 2021


New submission from Joran van Apeldoorn <git at blubmail.nl>:

The expectation would be that the re.A (or re.ASCII) flag should not impact the matching behavior of a regular expression on strings consisting only of ASCII characters.  However, for the characters 0x1c till 0x1f, the classes \s and \S differ. For ASCII theses characters are not considered space characters while for unicode they are. 

Note that python strings do consider these characters spaces as '\xc1'.isspace() gives True. 

All other classes and characters stay the same for unicode and ASCII matching.

----------
components: Regular Expressions
files: unicode-ascii-space.py
messages: 406773
nosy: control-k, ezio.melotti, mrabarnett
priority: normal
severity: normal
status: open
title: Unicode and acii regular expressions do not agree on ascii space characters
versions: Python 3.10, Python 3.11, Python 3.8, Python 3.9
Added file: https://bugs.python.org/file50457/unicode-ascii-space.py

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue45869>
_______________________________________


More information about the New-bugs-announce mailing list