Part of RFC 822 ignored by email module

Martin Gregorie martin at address-in-sig.invalid
Thu Jan 20 18:52:18 EST 2011


On Thu, 20 Jan 2011 17:58:36 -0500, Bob Kline wrote:
 
> Thanks.  I'm not sure everyone would agree that it's OK to collapse
> multiple consecutive spaces into one, but I'm beginning to suspect that
> those more concerned with preserving as much as possible of the original
> message are in the minority.  It sounds like my take-home distillation
> from this thread is "yes, the module ignores what the spec says about
> unfolding, but it doesn't matter."  I guess I can live with that.
>
I've been doing stuff in this area with the JavaMail package, though not 
as yet in Python. I've learnt that you parse the headers you can extract 
values that work well for comparisons, as database keys, etc. but are not 
guaranteed to let you reconstitute the original header byte for byte. If 
preserving the message exactly as received the solution is to parse the 
message to extract the headers and MIME parts you need for the 
application to carry out its function, but keep the original, unparsed 
message so you can pass it on.

The other gotcha is assuming that the MUA author read and understood the 
RFCs. Very many barely glanced at RFCs and/or misunderstood them. 
Consequently, if you use strict parsing you'll be surprised how many 
messages get rejected for having invalid headers or MIME headers. Fot 
instance, the mistakes some MUAs make when outputting To, CC and BCC 
headers with multiple addresses have to be seen to be believed. If the 
Python e-mail module lets you, set it to use lenient parsing. If this 
isn't an option you may well find yourself having to fix up messages 
before you can parse them successfully.


-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |



More information about the Python-list mailing list