[Tutor] Email and MIME

Kent Johnson kent37 at tds.net
Thu Sep 11 13:06:20 CEST 2008


On Wed, Sep 10, 2008 at 10:30 PM, grishma govani <grishma20 at gmail.com> wrote:
> Yes, I used the part of the code from the second link.
> I am using the mailbox modules too.
>
> I have the e-mails from gmail in a file on my computer. I have used the code
> below extract all the headers. As you can see for now I am using text stored
> in document as my body. I just want to extract the plain text and leave out
> all the html, duplicates of plain text and all the other information like
> content type, from etc.

If you have an mbox format file, I suggest using mboxMailbox instead
of UnixMailbox. UnixMailbox is perhaps obsolete - it is not documented
and it uses the deprecated rfc822.Message to return the messages.
rfc822.Message doesn't seem to have any ability to decode the message
body, so you are getting the raw message data.

mboxMailbox returns email.Message objects which understand encoding
and multipart messages. Instead of msg.fp.read() you would use the
richer email.Message methods to retrieve the data. See the last
example at
http://docs.python.org/lib/node161.html
for some hints.

Kent

PS Please use Reply All to reply on list.

>
> mb = mailbox.UnixMailbox(file('tmp/automated/Feedback', 'r'))
> fout = file('Feedback.txt', 'w')
> msg = mb.next()
>
> while msg is not None:
>    document = msg.fp.read()
>    document = passthrough_filter(msg, document)
>    msg = mb.next()
>
>
> def passthrough_filter(msg, document):
>    """This prints the 'from' address of the message and
>    returns the document unchanged.
>    """
>    from_addr = msg.getaddr('From')[0]
>    Sub = msg.get('Subject')
>    ContentType = msg.get('Content-Type')
>    ContentDisp = msg.get('Content-Disposition')
>    print "From:",from_addr
>    print "Subject:",Sub
>    print "Attachment:",None
>    print "Body:",document
>    print '\n'
>    return document
>
>
>
>
> On 10 Sep 2008, at 22:09, Kent Johnson wrote:
>
>> On Wed, Sep 10, 2008 at 4:06 PM, grishma govani <grishma20 at gmail.com>
>> wrote:
>>>
>>> Hello Everybody,
>>>
>>> I have been trying to extract the body of all the email messages from an
>>> mbox file.
>>
>> How are you doing this? Have you seen the mailbox module and this recipe:
>> http://docs.python.org/lib/mailbox-mbox.html
>> http://code.activestate.com/recipes/157437/
>>
>> Kent
>
>


More information about the Tutor mailing list