From skip at pobox.com Thu May 21 18:23:10 2009 From: skip at pobox.com (skip at pobox.com) Date: Thu, 21 May 2009 11:23:10 -0500 (CDT) Subject: [Email-SIG] Generating zipped or gzipped attachment with email package? Message-ID: <20090521162310.63FFA10859E5@montanaro.dyndns.org> (I posted this earlier to python-list at python.org then remembered we have an email package sig. I hope it's ok to ask usage questions here...) I have a script which allows me to generate MIME messages with appropriate attachments. It's essentially a lightly modified version of the second example from this page of the email package docs: http://docs.python.org/library/email-examples.html I want to modify my script to automatically zip or gzip files which exceed some size threshold. Doing the zip/gzip dance is no problem. I'm concerned about how to specify that properly with the email package. For example, consider a large CSV file. I figure out the MIME type is text/csv. Now suppose I gzip the file before attaching it. How would this code change to specify the compression where "path" is now compressed? if maintype == 'text': fp = open(path) # Note: we should handle calculating the charset msg = MIMEText(fp.read(), _subtype=subtype) fp.close() I guess I'm asking if I can have the Content-Type still be text/csv with some other MIME header indicating the file is compressed. If so, how do I achieve that when attaching the compressed file to the message? Thanks, -- Skip Montanaro - skip at pobox.com - http://www.smontanaro.net/ America's vaunted "free press" notwithstanding, story ideas that expose the unseemly side of actual or potential advertisers tend to fall by the wayside. Not quite sure why. -- Jim Thornton From phd at phd.pp.ru Thu May 21 18:36:41 2009 From: phd at phd.pp.ru (Oleg Broytmann) Date: Thu, 21 May 2009 20:36:41 +0400 Subject: [Email-SIG] Generating zipped or gzipped attachment with email package? In-Reply-To: <20090521162310.63FFA10859E5@montanaro.dyndns.org> References: <20090521162310.63FFA10859E5@montanaro.dyndns.org> Message-ID: <20090521163641.GG22535@phd.pp.ru> On Thu, May 21, 2009 at 11:23:10AM -0500, skip at pobox.com wrote: > I guess I'm asking if I can have the Content-Type still be text/csv with > some other MIME header indicating the file is compressed. If so, how do I > achieve that when attaching the compressed file to the message? Content-Encoding: gzip AFAIR Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From phd at phd.pp.ru Thu May 21 19:31:47 2009 From: phd at phd.pp.ru (Oleg Broytmann) Date: Thu, 21 May 2009 21:31:47 +0400 Subject: [Email-SIG] Generating zipped or gzipped attachment with email package? In-Reply-To: <20090521163641.GG22535@phd.pp.ru> References: <20090521162310.63FFA10859E5@montanaro.dyndns.org> <20090521163641.GG22535@phd.pp.ru> Message-ID: <20090521173147.GH22535@phd.pp.ru> On Thu, May 21, 2009 at 08:36:41PM +0400, Oleg Broytmann wrote: > On Thu, May 21, 2009 at 11:23:10AM -0500, skip at pobox.com wrote: > > I guess I'm asking if I can have the Content-Type still be text/csv with > > some other MIME header indicating the file is compressed. If so, how do I > > achieve that when attaching the compressed file to the message? > > Content-Encoding: gzip Sorry, it is for web. So it seems you need Content-Type: application/x-gzip Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From skip at pobox.com Thu May 21 20:06:56 2009 From: skip at pobox.com (skip at pobox.com) Date: Thu, 21 May 2009 13:06:56 -0500 Subject: [Email-SIG] Generating zipped or gzipped attachment with email package? In-Reply-To: <20090521163641.GG22535@phd.pp.ru> References: <20090521162310.63FFA10859E5@montanaro.dyndns.org> <20090521163641.GG22535@phd.pp.ru> Message-ID: <18965.38976.165443.986664@montanaro.dyndns.org> >> I guess I'm asking if I can have the Content-Type still be text/csv >> with some other MIME header indicating the file is compressed. If >> so, how do I achieve that when attaching the compressed file to the >> message? Oleg> Content-Encoding: gzip Thanks. I'm not having much success with it, though generating messages seems to work ok. Maybe I can't For example, Thunderbird insists on displaying the raw gzipped data (not the base64-encoded version which is in the attachment) and doesn't gunzip such attachments when I try to save them. Is it only supposed to undo the content-transfer-encoding? Here's the start of one of my attachments: Content-Type: text/csv; charset="us-ascii" MIME-Version: 1.0 Content-Encoding: gzip Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="somefile.csv" H4sIABKKFUoAA8z9y5JmuY6mB891LZluJMFjzTTZo99kkiznbVvqUve2v7tKVlWmw90LWAS5DiR4 WJ9neGZ1Z8YOjwh3f4IEX4Dgi//89//459/+4x///Z9/+5e/47/+8Z//n9/+29///T/+07//9l// 8V/+K/7nv/3r/43//t/+/ts//z//6d//K/37v/3rb//29//4x7/iT/73f/xnlf+j8T//9p/+jf9j fvs//v6///bv//gv+FuPf/+nv/9f/4y/99/+47d//pf/+Lf/F3/4j//47f/813/HP+n/Pj6h+u3/ /Ld//r/U8WN9/Fj/9i//x7/967/8x2//8r/9/X////8PRqn0u3K/6/ibiv9k3T8p/dv/+D/+z/+/ 3/8X/Oef/td//Jd//m//BOq//+NffvtN/wZfHlIyiX6QjA+WfqCT9iF/CGI8fgDRHz8IIQX4TX2p ... Thx, Skip From skip at pobox.com Thu May 21 20:08:07 2009 From: skip at pobox.com (skip at pobox.com) Date: Thu, 21 May 2009 13:08:07 -0500 Subject: [Email-SIG] Generating zipped or gzipped attachment with email package? In-Reply-To: <20090521173147.GH22535@phd.pp.ru> References: <20090521162310.63FFA10859E5@montanaro.dyndns.org> <20090521163641.GG22535@phd.pp.ru> <20090521173147.GH22535@phd.pp.ru> Message-ID: <18965.39047.475452.875713@montanaro.dyndns.org> Oleg> Sorry, it is for web. So it seems you need Oleg> Content-Type: application/x-gzip Alas, then I lose the notion that the file is actually text/csv. For example, how would Outlook or Thunderbird know to open such a file in Excel or OpenOffice Calc? Skip From phd at phd.pp.ru Thu May 21 20:17:23 2009 From: phd at phd.pp.ru (Oleg Broytmann) Date: Thu, 21 May 2009 22:17:23 +0400 Subject: [Email-SIG] Generating zipped or gzipped attachment with email package? In-Reply-To: <18965.39047.475452.875713@montanaro.dyndns.org> References: <20090521162310.63FFA10859E5@montanaro.dyndns.org> <20090521163641.GG22535@phd.pp.ru> <20090521173147.GH22535@phd.pp.ru> <18965.39047.475452.875713@montanaro.dyndns.org> Message-ID: <20090521181723.GI22535@phd.pp.ru> On Thu, May 21, 2009 at 01:08:07PM -0500, skip at pobox.com wrote: > Oleg> Sorry, it is for web. So it seems you need > Oleg> Content-Type: application/x-gzip > > Alas, then I lose the notion that the file is actually text/csv. For > example, how would Outlook or Thunderbird know to open such a file in Excel > or OpenOffice Calc? I think it unzips and then guesses the filetype of the unzipped file. Guessing is a widely deploying trick in software, alas. Web headers are better in this regard as there are separate Content-Type and Content-Encoding headers. In email, there is a Content-Transfer-Encoding header but it has a different meaning - it is either 8bit, or base64, or quoted-printable - it's the transfer encoding *after* the file has been (g)zipped. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From rdmurray at bitdance.com Thu May 21 20:30:46 2009 From: rdmurray at bitdance.com (R. David Murray) Date: Thu, 21 May 2009 14:30:46 -0400 (EDT) Subject: [Email-SIG] Generating zipped or gzipped attachment with email package? In-Reply-To: <18965.39047.475452.875713@montanaro.dyndns.org> References: <20090521162310.63FFA10859E5@montanaro.dyndns.org> <20090521163641.GG22535@phd.pp.ru> <20090521173147.GH22535@phd.pp.ru> <18965.39047.475452.875713@montanaro.dyndns.org> Message-ID: On Thu, 21 May 2009 at 13:08, skip at pobox.com wrote: > Alas, then I lose the notion that the file is actually text/csv. For > example, how would Outlook or Thunderbird know to open such a file in Excel > or OpenOffice Calc? By using whatever heuristics it uses for such a file on disk (including the filename extension), I would imagine. Suboptimal, but lacking a compression standard for email-MIME, it is probably the best you can do. (At least, I couldn't find any mention of a standard for compression other than a note about its non-existence in the rather old comp.mail.mime FAQ.) --David From tonynelson at georgeanelson.com Thu May 21 20:04:47 2009 From: tonynelson at georgeanelson.com (Tony Nelson) Date: Thu, 21 May 2009 14:04:47 -0400 Subject: [Email-SIG] Generating zipped or gzipped attachment with email package? In-Reply-To: <20090521162310.63FFA10859E5@montanaro.dyndns.org> References: <20090521162310.63FFA10859E5@montanaro.dyndns.org> Message-ID: At 11:23 -0500 05/21/2009, skip at pobox.com wrote: >(I posted this earlier to python-list at python.org then remembered we have an >email package sig. I hope it's ok to ask usage questions here...) > >I have a script which allows me to generate MIME messages with appropriate >attachments. It's essentially a lightly modified version of the second >example from this page of the email package docs: > > http://docs.python.org/library/email-examples.html > >I want to modify my script to automatically zip or gzip files which exceed >some size threshold. Doing the zip/gzip dance is no problem. I'm concerned >about how to specify that properly with the email package. For example, >consider a large CSV file. I figure out the MIME type is text/csv. Now >suppose I gzip the file before attaching it. How would this code change to >specify the compression where "path" is now compressed? > > if maintype == 'text': > fp = open(path) > # Note: we should handle calculating the charset > msg = MIMEText(fp.read(), _subtype=subtype) > fp.close() > >I guess I'm asking if I can have the Content-Type still be text/csv with >some other MIME header indicating the file is compressed. If so, how do I >achieve that when attaching the compressed file to the message? I think (untested): if maintype == 'text': fp = open(path) data = fp.read() fp.close() name = os.basename(path) if len(data) > datamax: # do the zip/gzip compression to data # set _subtype to 'zip' or 'x-gzip', or omit for octet-stream msg = MIMEApplication( data, _subtype='zip', _encoder=email.encoders.encode_base64, name=name ) else: msg = MIMEText(data) del msg['Content-Disposition'] # paranoia msg.add_header('Content-Disposition', 'attachment', # or 'inline' and omit the name filename=name ) This will set the Content-Type: to "application/zip", reflecting the actual type, encode to Base64 so the payload is ASCII, also setting the Content-Transfer-Encoding, set a default name, and tell the MUA whether to try to display the payload or save it as a file (also setting a default name). -- ____________________________________________________________________ TonyN.:' ' From mark at msapiro.net Thu May 21 21:16:36 2009 From: mark at msapiro.net (Mark Sapiro) Date: Thu, 21 May 2009 12:16:36 -0700 Subject: [Email-SIG] Generating zipped or gzipped attachment with emailpackage? In-Reply-To: Message-ID: R. David Murray wrote: >On Thu, 21 May 2009 at 13:08, skip at pobox.com wrote: >> Alas, then I lose the notion that the file is actually text/csv. For >> example, how would Outlook or Thunderbird know to open such a file in Excel >> or OpenOffice Calc? > >By using whatever heuristics it uses for such a file on disk (including >the filename extension), I would imagine. Suboptimal, but lacking >a compression standard for email-MIME, it is probably the best you can do. Ideally, one would be able to specify a parameter on the Content-Type; header along the lines of Content-Type: text/csv; charset="utf-8"; compression="gzip" The MIME standards allow for such parameter extensions, but since I just made that one up, no MUA is going to recognize it :( -- Mark Sapiro The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan From stephen at xemacs.org Fri May 22 07:24:43 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 22 May 2009 14:24:43 +0900 Subject: [Email-SIG] Generating zipped or gzipped attachment with emailpackage? In-Reply-To: References: Message-ID: <8763ftg15w.fsf@uwakimon.sk.tsukuba.ac.jp> Mark Sapiro writes: > Ideally, one would be able to specify a parameter on the Content-Type; > header along the lines of > > Content-Type: text/csv; charset="utf-8"; compression="gzip" No, I think this is really a content transfer encoding, not part of Content-Type, and I don't see why one would be enough. Nor would it necessarily always be compression. So how about a Content-Transfer-Filter header which resolves to an (order-sensitive!) list of transformations: Content-Transfer-Filter: pgp-encrypted; algorithm=idea; order=3 Content-Transfer-Filter: x-xz; order=2; comment="the successor to LZMA"; alternate-application=x-lzma Content-Transfer-Filter: base64; order=1 Order is "decoding order" here. Otherwise you'd need a parameter to determine which to use first (in case of corruption or reordering by some brain-damaged MUA or MTA). In the presence of a Content-Transfer-Encoding header, the Content-Transfer-Encoding should be applied first, then any Content-Transfer-Filters. From stephen at xemacs.org Fri May 22 07:28:21 2009 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 22 May 2009 14:28:21 +0900 Subject: [Email-SIG] Generating zipped or gzipped attachment with email package? In-Reply-To: <20090521181723.GI22535@phd.pp.ru> References: <20090521162310.63FFA10859E5@montanaro.dyndns.org> <20090521163641.GG22535@phd.pp.ru> <20090521173147.GH22535@phd.pp.ru> <18965.39047.475452.875713@montanaro.dyndns.org> <20090521181723.GI22535@phd.pp.ru> Message-ID: <874ovdg0zu.fsf@uwakimon.sk.tsukuba.ac.jp> Oleg Broytmann writes: > Skip wrote: > > Alas, then I lose the notion that the file is actually text/csv. For > > example, how would Outlook or Thunderbird know to open such a file in Excel > > or OpenOffice Calc? > > I think it unzips and then guesses the filetype of the unzipped file. Indeed. > Guessing is a widely deploying trick in software, alas. > Web headers are better in this regard as there are separate > Content-Type and Content-Encoding headers. No, they're not "better", except by accident of being able to deal with binary data by default. They suffer from the same problem that only one transformation can be applied. Of course this is usually enough, but not always.