[New-bugs-announce] [issue31677] email.header uses re.IGNORECASE without re.ASCII
INADA Naoki
report at bugs.python.org
Tue Oct 3 08:58:02 EDT 2017
New submission from INADA Naoki <songofacandy at gmail.com>:
email.header has this pattern:
https://github.com/python/cpython/blob/85c0b8941f0c8ef3ed787c9d504712c6ad3eb5d3/Lib/email/header.py#L34-L43
# Match encoded-word strings in the form =?charset?q?Hello_World?=
ecre = re.compile(r'''
=\? # literal =?
(?P<charset>[^?]*?) # non-greedy up to the next ? is the charset
\? # literal ?
(?P<encoding>[qb]) # either a "q" or a "b", case insensitive
\? # literal ?
(?P<encoded>.*?) # non-greedy up to the next ?= is the encoded string
\?= # literal ?=
''', re.VERBOSE | re.IGNORECASE | re.MULTILINE)
Since only 's' and 'i' has other lower case character, this is not a real bug.
But using re.ASCII is more safe.
Additionally, email.util has same pattern from 10 years ago, and it is not used by anywhere.
It should be removed.
----------
components: Regular Expressions
messages: 303612
nosy: ezio.melotti, inada.naoki, mrabarnett
priority: normal
severity: normal
status: open
title: email.header uses re.IGNORECASE without re.ASCII
versions: Python 3.7
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue31677>
_______________________________________
More information about the New-bugs-announce
mailing list