[Python-bugs-list] [ python-Bugs-795081 ] email.Message param
parsing problem II
SourceForge.net
noreply at sourceforge.net
Tue Aug 26 05:58:03 EDT 2003
Bugs item #795081, was opened at 2003-08-25 23:37
Message generated for change (Settings changed) made by bwarsaw
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=795081&group_id=5470
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Stuart D. Gathman (customdesigned)
>Assigned to: Barry A. Warsaw (bwarsaw)
Summary: email.Message param parsing problem II
Initial Comment:
The enclosed real life (inactivated) virus message
causes email.Message to fail to find the multipart
attachments. This is because the headers following
Content-Type are indented, causing email.Message to
properly append them to Content-Type. The trick is
that the boundary is quoted, and Outhouse^H^H^H^H^Hlook
apparently gets a value of 'bound' for boundary,
whereas email.Message gets the value
'"bound"\n\tX-Priority...'. email.Utils.unqoute
apparently gives up and doesn't remove any quotes.
I believe that unqoute should return just what is
between the quotes, so that '"abc" def' would be
unquoted to 'abc'. In fact, my email filtering
software (http://bmsi.com/python/milter.html) works
correctly on all kinds of screwy mail using my version
of unquote using this heuristic. I believe that header
used by the virus is invalid, so a STRICT parser should
reject it, but a tolerant parser (such as a virus
scanner would use) should use the heuristic.
Here is a brief script to show the problem (attached
file in test/virus5):
----------t.py----------
import email
msg = email.message_from_file(open('test/virus5','r'))
print msg.get_params()
---------------------
$ python2 t.py
[('multipart/mixed', ''), ('boundary',
'"bound"\n\tX-Priority: 3\n\tX-MSMail-Priority:
Normal\n\tX-Mailer: Microsoft Outlook Express
5.50.4522.1300\n\tX-MimeOLE: Produced By Microsoft
MimeOLE V5.50.4522.1300')]
----------------------------------------------------------------------
Comment By: Stuart D. Gathman (customdesigned)
Date: 2003-08-25 23:57
Message:
Logged In: YES
user_id=142072
Here is a proposed fix for email.Util.unquote (except it
should test for a 'strict' mode flag, which is current only
in Parser):
def unquote(str):
"""Remove quotes from a string."""
if len(str) > 1:
if str.startswith('"'):
if str.endswith('"'):
str = str[1:-1]
else: # remove garbage after trailing quote
try: str = str[1:str[1:].index('"')+1]
except: return str
return str.replace('\\', '\').replace('\"', '"')
if str.startswith('<') and str.endswith('>'):
return str[1:-1]
return str
Actually, I replaced only email.Message._unquotevalue for my
application to minimize the impact. That would also be a
good place to check for a STRICT flag stored with the
message object. Perhaps the Parser should set the Message
_strict flag from its own _strict flag.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=795081&group_id=5470
More information about the Python-bugs-list
mailing list