From vijay at accellion.com Tue Sep 4 06:27:04 2007 From: vijay at accellion.com (Vijay Rao) Date: Tue, 04 Sep 2007 12:27:04 +0800 Subject: [Email-SIG] Parsing email with large attachment Message-ID: <46dcde9d.15b38c0a.513a.0d0e@mx.google.com> Hi , I want to use the email package to parse emails with attachments upto 1GB. However I find that python crashes with a Memory error traceback while parsing the email with even a 300MB attachment at this point : self._cur.set_payload(EMPTYSTRING.join(lines)) --> feedparser.py I have the email contents in a file and the code is like ( on python2.5, winxp ) : self.msg = email.message_from_file(self.stream) ... ... #Check if any attachments at all if self.msg.get_content_maintype() != 'multipart': print 'No attachments in message' return for part in self.msg.walk(): # multipart/* are just containers if part.get_content_maintype() == 'multipart': continue is_attachment = part.get('Content-Disposition') if is_attachment is None : #body = part.get_payload(decode=True) #print 'Body' , body continue filename = part.get_filename() counter = 1 print 'Filename' , filename if not filename: filename = 'part-%03d%s' % (counter, 'bin') counter += 1 att_path = os.path.join(detach_dir, filename) #Check if its already there if not os.path.isfile(att_path) : fp = open(att_path, 'wb') fp.write(part.get_payload(decode=True)) fp.close() My machine has 2GB RAM so memory is not a problem and it seems python tries to allocate a large memory chunk while doing a list concatenation operation. Also it seems that peak memory used for parsing and extracting the attachment is three times the attachment size : 1) 2x used for parsing 2) 1x used for extracting it The only way to fix this seems to be rewriting the parser to not load the attachment into memory at all and maybe write it to a file , pass the file pointer to set_payload and decode the attachment in small chunks in get_payload instead of loading the entire file. Subclass message to accept a file pointer in set_payload, etc... Is there any other way to fix it , maybe compile python with some flags to allow list concatenation to access a larger amount of memory. Thanks, Vijay From barry at python.org Tue Sep 4 13:51:53 2007 From: barry at python.org (Barry Warsaw) Date: Tue, 4 Sep 2007 07:51:53 -0400 Subject: [Email-SIG] Parsing email with large attachment In-Reply-To: <46dcde9d.15b38c0a.513a.0d0e@mx.google.com> References: <46dcde9d.15b38c0a.513a.0d0e@mx.google.com> Message-ID: <7F981B77-E95F-46F8-A92C-556558A7C93B@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 4, 2007, at 12:27 AM, Vijay Rao wrote: > The only way to fix this seems to be rewriting the parser to not load > the attachment into memory at all and maybe write it to a file , pass > the file pointer to set_payload and decode the attachment in small > chunks in get_payload instead of loading the entire file. > Subclass message to accept a file pointer in set_payload, etc... We've long talked about adding an API to allow the parser to store attachment data externally instead of in memory. We've never gotten past the "yes, that would be a good idea" stage though. Care to propose an API and work on an implementation? - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRt1G2XEjvBPtnXfVAQJ6nwQAqR8XlNL9cg2Q+2sgGv740PkTtBNPenqQ IVIu2MGHJYibvM7LrHF24MMEnXi80t1+JUQff/HhAn9jTjF2N02jtS+q/nigSyY/ 08+YNmud9vgaOrGOOd1HAYIkYiCiv2YBUbhetJnsoV9dWS24Psp445qJl6/NvtdD fh2Ipz9cfys= =MIrO -----END PGP SIGNATURE----- From vijay at accellion.com Thu Sep 6 04:44:15 2007 From: vijay at accellion.com (E Vijay Rao) Date: Thu, 06 Sep 2007 10:44:15 +0800 Subject: [Email-SIG] Parsing email with large attachment In-Reply-To: <7F981B77-E95F-46F8-A92C-556558A7C93B@python.org> References: <46dcde9d.15b38c0a.513a.0d0e@mx.google.com> <7F981B77-E95F-46F8-A92C-556558A7C93B@python.org> Message-ID: <46df6983.24b48c0a.4c96.05e7@mx.google.com> Hi , Yes I would like to propose an API and work on the implementation. Any pointers on where to get started ? Vijay At 07:51 PM 9/4/2007, Barry Warsaw wrote: >-----BEGIN PGP SIGNED MESSAGE----- >Hash: SHA1 > >On Sep 4, 2007, at 12:27 AM, Vijay Rao wrote: > >>The only way to fix this seems to be rewriting the parser to not load >>the attachment into memory at all and maybe write it to a file , pass >>the file pointer to set_payload and decode the attachment in small >>chunks in get_payload instead of loading the entire file. >>Subclass message to accept a file pointer in set_payload, etc... > >We've long talked about adding an API to allow the parser to store >attachment data externally instead of in memory. We've never gotten >past the "yes, that would be a good idea" stage though. Care to >propose an API and work on an implementation? > >- -Barry > >-----BEGIN PGP SIGNATURE----- >Version: GnuPG v1.4.7 (Darwin) > >iQCVAwUBRt1G2XEjvBPtnXfVAQJ6nwQAqR8XlNL9cg2Q+2sgGv740PkTtBNPenqQ >IVIu2MGHJYibvM7LrHF24MMEnXi80t1+JUQff/HhAn9jTjF2N02jtS+q/nigSyY/ >08+YNmud9vgaOrGOOd1HAYIkYiCiv2YBUbhetJnsoV9dWS24Psp445qJl6/NvtdD >fh2Ipz9cfys= >=MIrO >-----END PGP SIGNATURE----- From mpant at ncsa.uiuc.edu Sun Sep 30 16:59:50 2007 From: mpant at ncsa.uiuc.edu (Meenal Pant) Date: Sun, 30 Sep 2007 09:59:50 -0500 Subject: [Email-SIG] PGP MIME headers Message-ID: <46FFB9E6.2070804@ncsa.uiuc.edu> Hello all, How can I create PGP MIME headers using email ? I want to create a PGP MIME email with the message body signed using GnuPG. Thanks, Meenal