[issue42484] parse_message_id, get_msg_id, get_obs_local_part is poorly written

Dickson Chan report at bugs.python.org
Fri Nov 27 07:36:59 EST 2020


New submission from Dickson Chan <dxn126 at gmail.com>:

parse_message_id in the email module crashes with bogus message-id

Having a Message-ID '<[>' gives me an IndexError: list index out of range

This happens when 
- creating an EmailMessage with the said Message-ID
    msg = EmailMessage()
    msg['Message-ID'] = '<[>'

- accessing the bogus Message-ID through
    msg.items()
or
    msg.get('Message-ID')

this doesn't happen with python 3.6 or 3.7 when MessageIDHeader didn't exist

3.8/Lib/email/headerregistry.py line 542

_default_header_map = {
    ....
    'message-id': MessageIDHeader,
    }

-------------------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.8/email/_header_value_parser.py", line 2069, in get_msg_id
    token, value = get_dot_atom_text(value)
  File "/usr/lib/python3.8/email/_header_value_parser.py", line 1334, in get_dot_atom_text
    raise errors.HeaderParseError("expected atom at a start of "
email.errors.HeaderParseError: expected atom at a start of dot-atom-text but found '[>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 4, in <module>
    msg['Message-ID'] = '<[>'
  File "/usr/lib/python3.8/email/message.py", line 409, in __setitem__
    self._headers.append(self.policy.header_store_parse(name, val))
  File "/usr/lib/python3.8/email/policy.py", line 148, in header_store_parse
    return (name, self.header_factory(name, value))
  File "/usr/lib/python3.8/email/headerregistry.py", line 607, in __call__
    return self[name](name, value)
  File "/usr/lib/python3.8/email/headerregistry.py", line 202, in __new__
    cls.parse(value, kwds)
  File "/usr/lib/python3.8/email/headerregistry.py", line 535, in parse
    kwds['parse_tree'] = parse_tree = cls.value_parser(value)
  File "/usr/lib/python3.8/email/_header_value_parser.py", line 2126, in parse_message_id
    token, value = get_msg_id(value)
  File "/usr/lib/python3.8/email/_header_value_parser.py", line 2073, in get_msg_id
    token, value = get_obs_local_part(value)
  File "/usr/lib/python3.8/email/_header_value_parser.py", line 1516, in get_obs_local_part
    if (obs_local_part[0].token_type == 'dot' or
IndexError: list index out of range
-------------------------------------------------------------------------------------------

as you can see in the traceback
get_msg_id() calls get_obs_local_part()
and in get_obs_local_part(), you have this

def get_obs_local_part(value):

    obs_local_part = ObsLocalPart()

    while value and (value[0]=='\\' or value[0] not in PHRASE_ENDS):
        ...
    if (obs_local_part[0].token_type == 'dot':
        ...

if value does not satisfy the condition in the while loop, 
this gives an IndexError as obs_local_part is empty
(the value in my example is '[>' from the message id '<[>')

shouldn't we have a proper Error or default back to no parsing if parsing fails?
There's no way of bypassing the parser and getting the Message-ID and 
I can't even handle the error with a try catch

----------
components: email
messages: 381947
nosy: barry, dxn126, r.david.murray
priority: normal
severity: normal
status: open
title: parse_message_id, get_msg_id, get_obs_local_part is poorly written
type: behavior
versions: Python 3.8, Python 3.9

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue42484>
_______________________________________


More information about the Python-bugs-list mailing list