[New-bugs-announce] [issue40359] email.parse part.get_filename() fails to unwrap long attachment file names
Matthew Davis
report at bugs.python.org
Tue Apr 21 20:26:27 EDT 2020
New submission from Matthew Davis <ubuntu.one at mdavis.xyz>:
# Summary
When parsing emails with long attachment file names, part.get_filename() often returns \n or \r\n.
It should strip those characters out.
# Steps to reproduce
I have attached a minimal working example.
The relevant part of the raw email is:
--_004_D6CEDE1EBD6645898F5643C0C6878005examplecom_
Content-Type: text/plain;
name="an attachment with a very very very long super long file name which has
many words and just keeps on going and going.txt"
# Expected output:
attachments = ["an attachment with a very very very long super long file name which has many words and just keeps on going and going.txt"]
Maybe I'm reading the email RFC spec wrong. My interpretation is that the parser should do something like:
raw = raw.replace('\r\n ', ' ').replace('\n ', ' ')
# Actual output
attachments = ["an attachment with a very very very long super long file name which\n has many words and just keeps on going and going.txt"]
Note that I have seen other examples where the output includes \r\n not just \n
# Impact
I'm trying to write an email bot which saves attachments to a database, and also forwards on the emails.
My both thinks that the filename includes a line break. This inevitably causes failures in my subsequent code.
# Relevant links:
The RFC for email spec is here: https://tools.ietf.org/html/rfc2822.html#section-2.2.3
This Stack Overflow answer seems relevant: https://stackoverflow.com/questions/3050298/parsing-email-with-python/3050374#3050374
Issue 3601 may be relevant, but doesn't seem exactly the same. It seems to be the reverse, *constructing* emails with long headers. My issue is *parsing* emails with long headers.
----------
components: email
files: mwe.py
messages: 366963
nosy: barry, matt-davis, r.david.murray
priority: normal
severity: normal
status: open
title: email.parse part.get_filename() fails to unwrap long attachment file names
versions: Python 3.6
Added file: https://bugs.python.org/file49083/mwe.py
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue40359>
_______________________________________
More information about the New-bugs-announce
mailing list