From aaron.fransen at gmail.com Wed Mar 4 23:30:50 2009 From: aaron.fransen at gmail.com (Aaron Fransen) Date: Wed, 4 Mar 2009 15:30:50 -0700 Subject: [Email-SIG] Problems with quoted-printable attachment Message-ID: <63ad7ef70903041430n21386c04lfdcca78b721fce2c@mail.gmail.com> I have a MIME email I've received generated by Microsoft Windows Mail 6.0.6001.18000. In it are two PDF documents encoded using quoted-printable. No matter what I do, or what method I try, I can't seem to decode the darned attachments properly! Yet Outlook has no issues decoding. I've tried quopri, email, even wrote my own decoder to see what I could figure out. All of them generate a file exactly 150 bytes shorter than the version Outlook generated. Even looking at the two files side-by-side (using Crimson Editor, probably not ideal for binary files) I can't see any differences. Any thoughts folks? Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at msapiro.net Thu Mar 5 00:00:18 2009 From: mark at msapiro.net (Mark Sapiro) Date: Wed, 4 Mar 2009 15:00:18 -0800 Subject: [Email-SIG] Problems with quoted-printable attachment In-Reply-To: <63ad7ef70903041430n21386c04lfdcca78b721fce2c@mail.gmail.com> Message-ID: Aaron Fransen wrote: > >I have a MIME email I've received generated by Microsoft Windows Mail >6.0.6001.18000. > >In it are two PDF documents encoded using quoted-printable. > >No matter what I do, or what method I try, I can't seem to decode the darned >attachments properly! Yet Outlook has no issues decoding. I've tried quopri, >email, even wrote my own decoder to see what I could figure out. All of them >generate a file exactly 150 bytes shorter than the version Outlook >generated. Are they missing a \r at the end of each of 150 lines? If that's the difference, I'm not sure that there's anything you can do about it as the sending MUA is not properly encoding the data. I.e. if a vs a line terminator is significant, I think the data should be base64 encoded. I know Outlook and maybe other Microsoft MUAs do encode some PDFs as quoted-printable, but I suspect this is wrong. OTOH, if some data are quoted-printable encoded as something=0D=0A= or other or equivalent, that should decode as something\r\nor other and if it is decoded as something\nor other then the decoding is wrong -- Mark Sapiro The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan From aaron.fransen at gmail.com Thu Mar 5 15:16:49 2009 From: aaron.fransen at gmail.com (Aaron Fransen) Date: Thu, 5 Mar 2009 07:16:49 -0700 Subject: [Email-SIG] Problems with quoted-printable attachment In-Reply-To: References: <63ad7ef70903041430n21386c04lfdcca78b721fce2c@mail.gmail.com> Message-ID: <63ad7ef70903050616q356c5f7ck712e2352bfb6d487@mail.gmail.com> You are correct. I figured it out about 10 minutes after I sent my email. Microsoft is sending LF instead of CR/LF. A simple string-replace fixed the problem. Found it on some page talking about how Microsoft likes to break the standards... Thanks Mark! On Wed, Mar 4, 2009 at 4:00 PM, Mark Sapiro wrote: > Aaron Fransen wrote: > > > >I have a MIME email I've received generated by Microsoft Windows Mail > >6.0.6001.18000. > > > >In it are two PDF documents encoded using quoted-printable. > > > >No matter what I do, or what method I try, I can't seem to decode the > darned > >attachments properly! Yet Outlook has no issues decoding. I've tried > quopri, > >email, even wrote my own decoder to see what I could figure out. All of > them > >generate a file exactly 150 bytes shorter than the version Outlook > >generated. > > > Are they missing a \r at the end of each of 150 lines? > > If that's the difference, I'm not sure that there's anything you can do > about it as the sending MUA is not properly encoding the data. I.e. if > a vs a line terminator is significant, I think the data > should be base64 encoded. I know Outlook and maybe other Microsoft > MUAs do encode some PDFs as quoted-printable, but I suspect this is > wrong. > > OTOH, if some data are quoted-printable encoded as > > something=0D=0A= > or other > > or equivalent, that should decode as > > something\r\nor other > > and if it is decoded as > > something\nor other > > then the decoding is wrong > > -- > Mark Sapiro The highway is for gamblers, > San Francisco Bay Area, California better use your sense - B. Dylan > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at msapiro.net Thu Mar 5 18:05:56 2009 From: mark at msapiro.net (Mark Sapiro) Date: Thu, 5 Mar 2009 09:05:56 -0800 Subject: [Email-SIG] OT Problems with quoted-printable attachment In-Reply-To: <63ad7ef70903050616q356c5f7ck712e2352bfb6d487@mail.gmail.com> Message-ID: Aaron Fransen wrote: >You are correct. I figured it out about 10 minutes after I sent my email. >Microsoft is sending LF instead of CR/LF. A simple string-replace fixed the >problem. > >Found it on some page talking about how Microsoft likes to break the >standards... In my more cynical moments, I think that Microsoft's goal is to have all Microsoft products interoperate with each other and with nothing else. -- Mark Sapiro The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan From tonynelson at georgeanelson.com Mon Mar 30 22:10:00 2009 From: tonynelson at georgeanelson.com (Tony Nelson) Date: Mon, 30 Mar 2009 16:10:00 -0400 Subject: [Email-SIG] Hello Message-ID: I'm Tony Nelson. I've been using Python for several years and the email package for a couple of years. I have one patch in Python for the socket module ("[issue1519025] New ver. of 1102879: Fix for 926423: socket timeouts"). Recently I've found that some odd things I was blaming on other causes are bugs in the email package. I'll be filing issues with patches. Currently I've filed "[issue5610] email feedparser.py CRLFLF bug: $ vs \Z". Repeatedly parsing and saving multipart messages was chewing off the trailing lines from submessage bodies. I'd like a procedural review of that issue and its attached files before I file more issues, so that I do them properly. Next is probably a fix and test for "[issue1721862] email.FeedParser.BufferedSubFile improperly handles '\r\n'" (when split across calls to .feed()). The error should be rare, only happening about every 8K messages for messages longer than 8K when parsed via parser.parse() or parser.parsestr() or email.message_from_file() or email.message_from_string(). The fix is for .push() to treat a last line ending with \r as ._partial. The test will call feedparser.feed() directly, so it can use short messages and not depend on the buffer size in parser.parse(). I might be persuaded to review or fix other open issues. -- ____________________________________________________________________ TonyN.:' ' From barry at python.org Mon Mar 30 23:41:13 2009 From: barry at python.org (Barry Warsaw) Date: Mon, 30 Mar 2009 16:41:13 -0500 Subject: [Email-SIG] Hello In-Reply-To: References: Message-ID: <86F10D57-A6C2-48CE-A045-A126531DE24F@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mar 30, 2009, at 3:10 PM, Tony Nelson wrote: > I'm Tony Nelson. I've been using Python for several years and the > email > package for a couple of years. I have one patch in Python for the > socket > module ("[issue1519025] New ver. of 1102879: Fix for 926423: socket > timeouts"). Recently I've found that some odd things I was blaming on > other causes are bugs in the email package. I'll be filing issues > with > patches. Hi Tony, welcome to the email sig! I'm actually sprinting on the email package today at Pycon. Chris Withers joined me until he had to fly home. Bug 1974 was Chris's particular itch and I now think I have a fix for this that isn't horrible, though unfortunately it will only land in 2.7 and shouldn't be back ported. > Currently I've filed "[issue5610] email feedparser.py CRLFLF bug: $ > vs \Z". > Repeatedly parsing and saving multipart messages was chewing off the > trailing lines from submessage bodies. I'd like a procedural review > of > that issue and its attached files before I file more issues, so that > I do > them properly. > > Next is probably a fix and test for "[issue1721862] > email.FeedParser.BufferedSubFile improperly handles '\r\n'" (when > split > across calls to .feed()). The error should be rare, only happening > about > every 8K messages for messages longer than 8K when parsed via > parser.parse() or parser.parsestr() or email.message_from_file() or > email.message_from_string(). The fix is for .push() to treat a last > line > ending with \r as ._partial. The test will call feedparser.feed() > directly, so it can use short messages and not depend on the buffer > size in > parser.parse(). > > I might be persuaded to review or fix other open issues. Very cool, thanks. I'll look at the above issues after I land the patch for 1974. My plan for the email package is: * Fix what we can for Python 2.7 but be very conservative with back ports to 2.6 * Ignore 3.0 * Work on a new API so that we can actually fix the horrible brokenness of email in Python 3. Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSdE8eXEjvBPtnXfVAQLxnwP+JgOPzMyy/d41SQLYAgnJWkJLNfmrHmq6 KkgyCC2drzZdd1lZvK5IuiGKEYmS0kQZF/dHUviXkqZgW2OUIp40zB59gbCg8AYD xAP21n+H/3bpD+xMuo3rbUh5Ft1GAsx/QGZQUUM1jyhlPU/xEY7QzbSVOf6L7xId Na5W/CZwEpE= =pS1Z -----END PGP SIGNATURE----- From tonynelson at georgeanelson.com Tue Mar 31 00:46:41 2009 From: tonynelson at georgeanelson.com (Tony Nelson) Date: Mon, 30 Mar 2009 18:46:41 -0400 Subject: [Email-SIG] Hello In-Reply-To: <86F10D57-A6C2-48CE-A045-A126531DE24F@python.org> References: <86F10D57-A6C2-48CE-A045-A126531DE24F@python.org> Message-ID: At 16:41 -0500 2009/03/30, Barry Warsaw wrote: >-----BEGIN PGP SIGNED MESSAGE----- >Hash: SHA1 > >On Mar 30, 2009, at 3:10 PM, Tony Nelson wrote: > >>I'm Tony Nelson. ... ... >Hi Tony, welcome to the email sig! Thank you. >I'm actually sprinting on the >email package today at Pycon. Chris Withers joined me until he had to >fly home. Bug 1974 was Chris's particular itch and I now think I have >a fix for this that isn't horrible, though unfortunately it will only >land in 2.7 and shouldn't be back ported. A worthy issue. Hopefully header parsing and generation can be cleaned up more befre 2.7/3.1 so that proper RFC2822 2.2.3 folding can be the norm. For example, unstructured header fields such as Subject: have whitespace as part of the unstructured token, and structured fields can skip whitespace, so leading whitespace should not be stripped by FeedParser._parse_headers(). This would help with idempotency. >>Currently I've filed "[issue5610] email feedparser.py CRLFLF bug: $ vs >>\Z". ... ... >> I might be persuaded to review or fix other open issues. > >Very cool, thanks. I'll look at the above issues after I land the >patch for 1974. Ack. Only one issue yet [issue5610]; I want to know if I'm doing it right before filing others. >My plan for the email package is: > >* Fix what we can for Python 2.7 but be very conservative with back >ports to 2.6 >* Ignore 3.0 >* Work on a new API so that we can actually fix the horrible >brokenness of email in Python 3. Hmm, I haven't used Python 3 yet, and didn't know about that. I suppose it is due to bytes/unicode confusion? There should be an "obvious" place for users to get a current email package suitable for at least the last few Python 2.x, at least if it starts getting more love again. I don't know quite where that should be, whether a SourceForge (or similar) page, a listing on PyPI, both, or what. Just something simpler than a SVN checkout. -- ____________________________________________________________________ TonyN.:' ' From barry at python.org Tue Mar 31 05:03:57 2009 From: barry at python.org (Barry Warsaw) Date: Mon, 30 Mar 2009 22:03:57 -0500 Subject: [Email-SIG] Hello In-Reply-To: References: <86F10D57-A6C2-48CE-A045-A126531DE24F@python.org> Message-ID: <9A8B29F0-32C1-4E29-9886-968701DD3C42@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Commenting on parts of your message, since I haven't looked at 5610 yet. On Mar 30, 2009, at 5:46 PM, Tony Nelson wrote: > A worthy issue. Hopefully header parsing and generation can be > cleaned up > more befre 2.7/3.1 so that proper RFC2822 2.2.3 folding can be the > norm. > For example, unstructured header fields such as Subject: have > whitespace as > part of the unstructured token, and structured fields can skip > whitespace, > so leading whitespace should not be stripped by > FeedParser._parse_headers(). This would help with idempotency. While I completely agree with you here, I don't think it will be possible to fix this in Python 2.7. That doesn't mean that we can't provide a working email package for Python 2.x though. I think doing structured folding will require API changes and I think I know the API I want. I'm trying to get some cycles to write about it or create some working code. >> My plan for the email package is: >> >> * Fix what we can for Python 2.7 but be very conservative with back >> ports to 2.6 >> * Ignore 3.0 >> * Work on a new API so that we can actually fix the horrible >> brokenness of email in Python 3. > > Hmm, I haven't used Python 3 yet, and didn't know about that. I > suppose it > is due to bytes/unicode confusion? Yes. The email package has a really broken notion of bytes vs. text. Grep for raw-unicode-escape for the brain-hurty. Fixing this too really requires an API change, and again I've talked with folks so I think I know where to go with this. I can haz free hacking cycles? > There should be an "obvious" place for users to get a current email > package > suitable for at least the last few Python 2.x, at least if it starts > getting more love again. I don't know quite where that should be, > whether > a SourceForge (or similar) page, a listing on PyPI, both, or what. > Just > something simpler than a SVN checkout. We've done standalone email package releases in the past, and I think we'll do the same with the new version, distributing it on the cheeseshop. We'll either do this out of the 3.1 tree or from the sandbox. The tricky part will be dealing with the Python 2 back porting. Hopefully we'll be able to use the mythical 3to2 tool that folks are starting to talk about/work on, otherwise we'll have to manually maintain a Python 2 port. I definitely think it's better to work the details out for Py3 first though; it'll force us to be explicit about bytes vs. strings, so we won't fall into the sloppiness of the current code. Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSdGIHnEjvBPtnXfVAQLPywQAjRs5JtxREGVyuG+eAJhh29ICrbMaucrz /nVi8GBVTYzJWYJkzvvvc31VMY28xNLWPuO2uO10eVQd+zYfsa2oXOOXvvXM8PrH taP+i1xzQ2b8ANbbehcBPosksOKCU8hpiMes7h43U9NuBGtf8NBaU50diT/N3jua VQopywOTfEw= =pmfa -----END PGP SIGNATURE-----