[New-bugs-announce] [issue11947] re.IGNORECASE does not match literal "_" (underscore)
Robert Meerman
report at bugs.python.org
Thu Apr 28 19:02:56 CEST 2011
New submission from Robert Meerman <robert.meerman at gmail.com>:
Regular expressions which are written match literal underscores ("_", ASCII
ordinal 95) and specify `re.IGNORECASE` during compilation do not consistently
match underscores: it seems some occurrences are matched, but others are not.
The following session log shows the problem:
Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> subject = "[Conclave-Mendoi]_ef_-_a_tale_of_memories_00-12_H264"
>>> print subject.encode("base64") # Incase my environment encoding is to blame
W0NvbmNsYXZlLU1lbmRvaV1fZWZfLV9hX3RhbGVfb2ZfbWVtb3JpZXNfMDAtMTJfSDI2NA==
>>> re.sub("_", "X", subject) # No flags, does what I expect
'[Conclave-Mendoi]XefX-XaXtaleXofXmemoriesX00-12XH264'
>>>
>>> re.sub("_", "X", subject, re.IGNORECASE) # Misses some matches
'[Conclave-Mendoi]XefX-_a_tale_of_memories_00-12_H264'
>>>
>>> re.sub("_", "X", subject, re.IGNORECASE | re.LOCALE) # Misses fewer matches
'[Conclave-Mendoi]XefX-XaXtaleXofXmemories_00-12_H264'
>>>
>>> re.sub("_", "X", subject, re.IGNORECASE | re.LOCALE | re.UNICODE) # Works OK
'[Conclave-Mendoi]XefX-XaXtaleXofXmemoriesX00-12XH264'
>>>
>>> re.sub("_", "X", subject, re.IGNORECASE | re.UNICODE) # Works OK
'[Conclave-Mendoi]XefX-XaXtaleXofXmemoriesX00-12XH264'
>>>
>>> type(subject) # Don't think this is a unicode string
<type 'str'>
>>>
Since my `subject` variable is of type `str` and only contains ASCII characters
I do not believe that the `re.UNICODE` flag should be required.
----------
components: Regular Expressions
messages: 134700
nosy: RobM, effbot, ezio.melotti, pitrou
priority: normal
severity: normal
status: open
title: re.IGNORECASE does not match literal "_" (underscore)
type: behavior
versions: Python 2.6
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue11947>
_______________________________________
More information about the New-bugs-announce
mailing list