How can I get text of the body (payload) of an email?

Josiah Carlson jcarlson at uci.edu
Sat Oct 16 11:31:56 EDT 2004


> I'm puzzled.  Josiah suggested that this would allow me to get the
> payload of an email message.
> 
> body = message.split('\r\n\r\n', 1)[1]
> 
> As I understand it, the headers of an email are terminated by a blank
> line, after which comes the message payload.  A blank line being
> represented by \r\n\r\n
> 
> After trying Josiah's above suggestion on many emails and failing to
> get it to work, I found that in fact the following works:
> 
> self.raw_data.split('\n\n', 1)[0]
> 
> But this doesn't agree with my understanding of the RFC822 email
> format, which is that the blank line should be represented by \r\n\r\n
> 
> Can anyone suggest where my understanding is wrong?
> Thanks


Your understanding isn't wrong, but somehow you are acquiring emails
with only line feed line endings.  This may be the case of opening a
file and getting universal line-ending support (which tosses '\r'). This
could be the case of some other processing you do perhaps stripping it
out (I don't use the email package, so don't know what it may or may not
be doing).

A known method of normalizing line endings for data that could come from
anywhere is through the use of regular expressions:

email = re.sub('(\r\n|\r|\n)', email_with_ambiguous_line_endings, '\r\n')


If you know your data to be good on disk, perhaps it would be better to
open files as 'rb' to make sure that universal line ending support is
not used.

 - Josiah




More information about the Python-list mailing list