Reading Outlook .msg file using Python

Jon Clements joncle at googlemail.com
Thu Oct 21 04:34:00 EDT 2010


On 20 Oct, 18:13, John Henry <john106he... at hotmail.com> wrote:
> On Oct 20, 9:01 am, John Henry <john106he... at hotmail.com> wrote:
>
>
>
> > On Oct 20, 1:41 am, Tim Golden <m... at timgolden.me.uk> wrote:
>
> > > On 19/10/2010 22:48, John Henry wrote:
>
> > > > Looks like this flag is valid only if you are getting messages
> > > > directly from Outlook.  When reading the msg file, the flag is
> > > > invalid.
>
> > > > Same issue when accessing attachments.  In addition, the MAPITable
> > > > method does not seem to work at all when trying to get attachments out
> > > > of the msg file (works when dealing with message in an Outlook
> > > > mailbox).  Eitherway, the display_name doesn't work when trying to
> > > > display the filename of the attachment.
>
> > > > I was able to get the date by using the PR_TRANSPORT_MESSAGE_HEADERS
> > > > mapitags
>
> > > Ah, thanks. As you will have realised, my code is basically geared
> > > to reading an Outlook/Exchange message box. I hadn't really tried
> > > it on individual message files, except my original excerpt. If it
> > > were opportune, I'd be interested in seeing your working code.
>
> > > TJG
>
> > When (and if) I finally figure out how to get it done, I surely will
> > make the code available.  It's pretty close.  All I need is to figure
> > out how to extract the attachments.
>
> > Too bad I don't know (and don't have) C#.  This guy did it so cleanly:
>
> >http://www.codeproject.com/KB/office/reading_an_outlook_msg.aspx?msg=...
>
> > May be somebody that knows both C# and Python can convert the code
> > (not much code) and then the Python community will have it.  As it
> > stands, it seems the solution is available in Java, C#, VB .... but
> > not Python.
>
> BTW: For the benefit of future search on this topic, with the code
> listed above where:
>
> storage_flags = STGM_DIRECT | STGM_READ | STGM_SHARE_EXCLUSIVE
>
> I had to change it to:
>
> storage_flags = STGM_DIRECT | STGM_READ | STGM_SHARE_DENY_NONE |
> STGM_TRANSACTED
>
> otherwise I get a sharing violation (seehttp://efreedom.com/Question/1-1086814/Opening-OLE-Compound-Documents...).
>
> For now, I am using a brute force method (http://mail.python.org/
> pipermail/python-win32/2009-February/008825.html) to get the names of
> the attachments and if I need to extract the attachments, I pop up the
> message in Outlook and let Outlook extract the files.  Ugly but fits
> my client's need for now.  Hopefully there will be a cleaner solution
> down the road.
>
> Here's my code for brute forcing attachments out of the msg file (very
> ugly):
>
>         def get_attachments(self, fileID):
>                 #from win32com.storagecon import *
>                 from win32com import storagecon
>                 import pythoncom
>
>                 flags = storagecon.STGM_READ | storagecon.STGM_SHARE_DENY_NONE |
> storagecon.STGM_TRANSACTED
>                 try:
>                         storage = pythoncom.StgOpenStorage (fileID, None, flags)
>                 except:
>                         return []
>
>                 flags = storagecon.STGM_READ | storagecon.STGM_SHARE_EXCLUSIVE
>                 attachments=[]
>                 for data in storage.EnumElements ():
>                         print data[0], data[1]
>                         if data[1] == 2 or data[0] == "__substg1.0_007D001F":
>                                 stream = storage.OpenStream (data[0], None, flags)
>                                 try:
>                                         msg = stream.Read (data[2])
>                                 except:
>                                         pass
>                                 else:
>                                         msg = repr (msg).replace("\
> \x00","").strip("'").replace("%23","#")
>                                         if data[0] == "__substg1.0_007D001F":
>                                                 try:
>                                                         attachments.append(msg.split("name=\"")[1].split("\"")[0])
>                                                 except:
>                                                         pass
>
>                 return attachments

Only just noticed this thread, and had something similar. I took the
following approach:-

(I'm thinking this might be relevant as you mentioned checking whether
your client's Outlook could export .EML directly, which indicates (to
me at least) that you have some control over that...)

- Set up an IMAP email server on a machine (in this case linux and
dovecot)
- Got client to set up a new account in Outlook for the new server
- Got client to use the Outlook interface to copy relevant emails (or
the whole lot) to new server
- Used the standard imaplib and related modules to do what was needed

>From my POV I didn't have to mess around with proprietary formats or
deal with files. From the client's POV, they were able to, with an
interface familiar to them, add/remove what needed processing. It also
enabled multiple people at the client's site to contribute their
emails that might have been relevant for the task.

The program created a sub-folder under the new server, did the
processing, and injected the results to that folder, the client could
then drag 'n' drop to whatever folder they personally used for filing
their end.

They felt in control, and I didn't have to bugger about with maildir/
mbox/pst/eml, whether it was outlook/thunderbird/evolution etc...

If you're only doing "an email here or email there" and don't want to/
can't go full blown mail server route, then a possible option would be
to mock an imap server (most likely using the twisted framework) that
upon an 'APPEND' processes the 'received' email appropriately... (kind
of a server/procmail route...)


Just a couple of ideas.

Cheers,

Jon.







More information about the Python-list mailing list