[New-bugs-announce] [issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

Tom Christiansen report at bugs.python.org
Thu Aug 11 21:18:31 CEST 2011


New submission from Tom Christiansen <tchrist at perl.com>:

You cannot use Python's lib re for handling Unicode regular expressions because it violates the standard set out for the same in UTS#18 on Unicode Regular Expressions in RL1.2a on compatibility properties.  What \w is allowed to match is clearly explained there, but Python has its own idea. Because it is in clear violation of the standard, it is misleading and wrong for Python to claim that the re.UNICODE flag makes \w and friends match Unicode.  Here are the failed test cases when the attached file is run under v3.2; there are further failures when run under v2.7.

FAIL lib re    found non alphanumeric string café
FAIL lib re    found non alphanumeric string Ⓚ
FAIL lib re    found non alphanumeric string ͅ
FAIL lib re    found non alphanumeric string ְ
FAIL lib re    found non alphanumeric string 𝟘
FAIL lib re    found non alphanumeric string 𐍁
FAIL lib re    found non alphanumeric string 𝔘𝔫𝔦𝔠𝔬𝔡𝔢
FAIL lib re    found non alphanumeric string 𐐔𐐯𐑅𐐨𐑉𐐯𐐻
FAIL lib re    found non alphanumeric string connector‿punctuation
FAIL lib re    found non alphanumeric string Ὰͅ_Στο_Διάολο
FAIL lib re    found non alphanumeric string 𐌰𐍄𐍄𐌰‿𐌿𐌽𐍃𐌰𐍂‿𐌸𐌿‿𐌹𐌽‿𐌷𐌹𐌼𐌹𐌽𐌰𐌼
FAIL lib re    found all alphanumeric string ¹²³
FAIL lib re    found all alphanumeric string ₁₂₃
FAIL lib re    found all alphanumeric string ¼½¾
FAIL lib re    found all alphanumeric string ⑶

Note that Matthew Barnett's regex lib for Python handles all of these cases in comformance with The Unicode Standard.

----------
components: Regular Expressions
files: alnum.python
messages: 141920
nosy: tchrist
priority: normal
severity: normal
status: open
title: python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a
type: behavior
versions: Python 2.7
Added file: http://bugs.python.org/file22881/alnum.python

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12731>
_______________________________________


More information about the New-bugs-announce mailing list