From lutz at rmi.net Fri May 7 13:15:19 2010 From: lutz at rmi.net (lutz at rmi.net) Date: Fri, 7 May 2010 07:15:19 -0400 (EDT) Subject: [Email-SIG] email package status in 3.X? Message-ID: <25286552.1273230920059.JavaMail.root@mswamui-thinleaf.atl.sa.earthlink.net> I'm updating the current Programming Python for 3.X, as well as its fairly large email client examples (GUI- and web-based) that rely on the email package heavily. I've gotten to the point where I need to decode the bytes of a message fetched with poplib into the Unicode strings expected by the email parser, and run into the dilemma--because decoding may require headers inspection, it appears that scripts need to parse in order to decode, but need to decode in order to parse. I know this is being discussed and may be addressed soon, but because the email package is crucial to this book's largest examples, I'm looking or a bit more information on this: --What's the current ETA on a new version of the email parser which handles byte strings? The web suggests it might be 3.2, 3.3, or even 3.4. It seems to still be in early stages. --How backward compatible will the new email be? I'm assuming it will handle bytes but be otherwise very similar, but 3.x set quite a precedent for changes, and changes break books. Any updates on this would be appreciated; for better or worse, email is a major dependency for one of the flagship Python books out there. Since postponing the update probably isn't an option, I'm leaning towards decoding per a user-configurable default (latin1 or utf8?) for now, but that's less than ideal. Thanks, --Mark Lutz (feel free to cross-post if this belongs elsewhere, and please respond to my email address directly) From rdmurray at bitdance.com Sun May 9 23:31:25 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Sun, 09 May 2010 17:31:25 -0400 Subject: [Email-SIG] email package status in 3.X? In-Reply-To: <25286552.1273230920059.JavaMail.root@mswamui-thinleaf.atl.sa.earthlink.net> References: <25286552.1273230920059.JavaMail.root@mswamui-thinleaf.atl.sa.earthlink.net> Message-ID: <20100509213125.561A6206FCC@kimball.webabinitio.net> On Fri, 07 May 2010 07:15:19 -0400, lutz at rmi.net wrote: > --What's the current ETA on a new version of the email parser > which handles byte strings? The web suggests it might be 3.2, > 3.3, or even 3.4. It seems to still be in early stages. My best guess at this point (it's an informed guess, but still very much a guess) is that email6 will be available in 3.3, and I am hoping there will be a pypi package available for testing it under 3.1/3.2 some time before the end of this year. > --How backward compatible will the new email be? I'm assuming > it will handle bytes but be otherwise very similar, but 3.x set > quite a precedent for changes, and changes break books. Sorry to be the bearer of bad news from your point of view, but there are indeed likely to be a number of fairly significant changes. The plan is to provide a backward compatibility layer, but that probably doesn't help you much since you'd presumably rather discuss the "official" API. > Any updates on this would be appreciated; for better or worse, > email is a major dependency for one of the flagship Python books > out there. Since postponing the update probably isn't an option, > I'm leaning towards decoding per a user-configurable default > (latin1 or utf8?) for now, but that's less than ideal. Email is a major dependency for a number of things, and IMO is perhaps the biggest thing blocking Python3 adoption that the Python development community has any control over. Unfortunately there is currently a distinct lack of volunteer time to work on it. Several of us are working on ways to support and speed up email6 development. There is a GSoC student who will be doing some work, with me as mentor, and I am hoping to get funding to be able to spend a significant number of hours on the package on a contract-programming basis as well. There are structures the PSF needs to put in place before I can do fundraising for that, however. If you know anyone who might want to just pay me for it straight out, let me know :) As for what you do *now*...unfortunately I don't know of any answer that works, otherwise we'd have implemented it. -- R. David Murray www.bitdance.com From lutz at rmi.net Mon May 10 20:02:46 2010 From: lutz at rmi.net (lutz at rmi.net) Date: Mon, 10 May 2010 14:02:46 -0400 (EDT) Subject: [Email-SIG] email package status in 3.X? Message-ID: <12392538.1273514567375.JavaMail.root@mswamui-andean.atl.sa.earthlink.net> Thanks very much for your reply. I'm probably going to have to go ahead and finish the book with the email package as it is now, and include a lot of caveats about the problems that a new version may fix in the future. I can also post updated example code if/when possible. I realize everybody on this list probably knows this already, but email in 3.X not only doesn't support the Unicode/bytes dichotomy, it was also broken by it. Beyond the pre-parse decode issue, its mail text generation really only works for all-text mails. Generating text of an email with any sort of binary part doesn't work at all now, because the base64 text is still bytes, and the Generator expects str. I've coded a custom encoder to pass to MIMEImage that works around this by decoding to ASCII, but it's not a great story to have to tell the tens of thousands of readers of this book, many of whom will be evaluating 3.X in general. It's unfortunate, IMHO, that the powers that be chose to ship Python 3.0 with a badly broken email package. This probably could have been avoided with a short period of concerted effort by pydev, and I think it does leave 3.X with a bit of a black eye. Two years later, the 3.0 I/O speed issue has been fixed but this has not? Odd, that. I'm also not convinced that poplib, smptlib, or ftplib in 3.X completely address the brave new Unicode world either, but time and 3.X users will tell. Then again, such is life in realistic software development. At the end of the day, I suppose this isn't a bad lesson for readers to learn. As for funding, I don't have any specific ideas, but this project should clearly be a top priority. Thanks again, --Mark Lutz -----Original Message----- >From: "R. David Murray" >Sent: May 9, 2010 5:31 PM >To: lutz at rmi.net >Cc: email-sig at python.org >Subject: Re: [Email-SIG] email package status in 3.X? > >On Fri, 07 May 2010 07:15:19 -0400, lutz at rmi.net wrote: >> --What's the current ETA on a new version of the email parser >> which handles byte strings? The web suggests it might be 3.2, >> 3.3, or even 3.4. It seems to still be in early stages. > >My best guess at this point (it's an informed guess, but still very much >a guess) is that email6 will be available in 3.3, and I am hoping there >will be a pypi package available for testing it under 3.1/3.2 some time >before the end of this year. > >> --How backward compatible will the new email be? I'm assuming >> it will handle bytes but be otherwise very similar, but 3.x set >> quite a precedent for changes, and changes break books. > >Sorry to be the bearer of bad news from your point of view, but there >are indeed likely to be a number of fairly significant changes. The plan >is to provide a backward compatibility layer, but that probably doesn't >help you much since you'd presumably rather discuss the "official" API. > >> Any updates on this would be appreciated; for better or worse, >> email is a major dependency for one of the flagship Python books >> out there. Since postponing the update probably isn't an option, >> I'm leaning towards decoding per a user-configurable default >> (latin1 or utf8?) for now, but that's less than ideal. > >Email is a major dependency for a number of things, and IMO is perhaps >the biggest thing blocking Python3 adoption that the Python development >community has any control over. Unfortunately there is currently a >distinct lack of volunteer time to work on it. > >Several of us are working on ways to support and speed up email6 >development. There is a GSoC student who will be doing some work, >with me as mentor, and I am hoping to get funding to be able to spend >a significant number of hours on the package on a contract-programming >basis as well. There are structures the PSF needs to put in place before >I can do fundraising for that, however. If you know anyone who might >want to just pay me for it straight out, let me know :) > >As for what you do *now*...unfortunately I don't know of any answer that >works, otherwise we'd have implemented it. > >-- >R. David Murray www.bitdance.com From rdmurray at bitdance.com Mon May 10 21:21:42 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Mon, 10 May 2010 15:21:42 -0400 Subject: [Email-SIG] email package status in 3.X? In-Reply-To: <12392538.1273514567375.JavaMail.root@mswamui-andean.atl.sa.earthlink.net> References: <12392538.1273514567375.JavaMail.root@mswamui-andean.atl.sa.earthlink.net> Message-ID: <20100510192142.1F13A2093C9@kimball.webabinitio.net> On Mon, 10 May 2010 14:02:46 -0400, lutz at rmi.net wrote: > I realize everybody on this list probably knows this already, > but email in 3.X not only doesn't support the Unicode/bytes > dichotomy, it was also broken by it. Beyond the pre-parse > decode issue, its mail text generation really only works for > all-text mails. Generating text of an email with any sort of > binary part doesn't work at all now, because the base64 text > is still bytes, and the Generator expects str. I've coded a There's an open bug report for this, and it can be addressed with a fix in the current package (I just bumped the prio to critical to make sure I get it into the next release). > custom encoder to pass to MIMEImage that works around this > by decoding to ASCII, but it's not a great story to have to > tell the tens of thousands of readers of this book, many of > whom will be evaluating 3.X in general. > > It's unfortunate, IMHO, that the powers that be chose to ship > Python 3.0 with a badly broken email package. This probably > could have been avoided with a short period of concerted effort > by pydev, and I think it does leave 3.X with a bit of a black > eye. Two years later, the 3.0 I/O speed issue has been fixed > but this has not? Odd, that. I'm also not convinced that Well, speeding up IO was a matter of rewriting an already designed and implemented python-based package in C, and volunteers with an interest stepped forward to do that job. Fixing email involves designing and implementing a new version of email that can handle the separation between bytes and unicode correctly. (You will note that the 2.x package did not do so, and that fact is the source of many still-open bugs.) Unfortunately, none of the email experts involved in Python development had any time available to do this work, and until I expressed interest at the end of last year no new volunteers had come forward to write code. > poplib, smptlib, or ftplib in 3.X completely address the brave > new Unicode world either, but time and 3.X users will tell. I am afraid that you are correct. We've found an fixed a few things, but I'm pretty sure there are more waiting to be found. If you have time to file bugs for anything you come across, that would be most helpful. > Then again, such is life in realistic software development. Particularly in the primarily-volunteer open source world. > At the end of the day, I suppose this isn't a bad lesson for > readers to learn. As for funding, I don't have any specific > ideas, but this project should clearly be a top priority. Thanks. I've forwarded your note to the PSF board, as a reminder of how important this is ;) -- R. David Murray www.bitdance.com From matt at mondoinfo.com Mon May 10 21:51:46 2010 From: matt at mondoinfo.com (Matthew Dixon Cowles) Date: Mon, 10 May 2010 14:51:46 -0500 (CDT) Subject: [Email-SIG] email package status in 3.X? In-Reply-To: <12392538.1273514567375.JavaMail.root@mswamui-andean.atl.sa.earthlink.net> References: <12392538.1273514567375.JavaMail.root@mswamui-andean.atl.sa.earthlink.net> Message-ID: <1273517420.44.12232@mint-julep.mondoinfo.com> Mark, > I realize everybody on this list probably knows this already, > but email in 3.X not only doesn't support the Unicode/bytes > dichotomy, it was also broken by it. Yes, it's a shame that it has worked out that way. I think it's because email is an almost uniquely hard problem when you try to make a sharp distinction between text and bytes. When you receive an email, what have you got? It's supposed to be ASCII, but of course it often isn't. What character set should you assume that those eight-bit characters are in? The program that's using the module probably does want to try to guess since it probably wants to make as much sense as possible out of an incorrectly formed email. The same goes for mis-specified encodings, both in headers and in MIME parts. So you probably need to provide multiple ways of getting at headers and the MIME parts that claim to be text. You'll want to be able to get at the original data (probably as bytes for safety) and the text version if one can be created. And so forth. Happily passing eight-bit strings around with the assumption that the user would make the correct sense of them mapped onto email really well. Trying to make a strict distinction between bytes and text turns out to be a bit of a mess in this context. But you probably already knew all that as well. Regards, Matt From barry at python.org Wed May 12 16:42:12 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 12 May 2010 16:42:12 +0200 Subject: [Email-SIG] Fw: mime stuff :) Message-ID: <20100512164212.4f0a75da@heresy> Robert Collins is a colleague of mine at Canonical. He accosted me in the halls of UDS and reminded me of some code he'd written to handle content types and byte/string content. I don't have much time right now to comment on it, but it looks interesting and I want to get this into our archives. We should look at it for some of our low-level implementation bits. -Barry Begin forwarded message: Date: Tue, 11 May 2010 20:32:46 +1200 From: Robert Collins To: barry.warsaw at canonical.com Subject: mime stuff :) So, you've seen the code, its a good idea, you said you'd prod someone to say 'yes and merge' ;) http://bazaar.launchpad.net/~testtools-dev/testtools/trunk/annotate/head:/testtools/content_type.py http://bazaar.launchpad.net/~testtools-dev/testtools/trunk/annotate/head:/testtools/content.py Tests are http://bazaar.launchpad.net/~testtools-dev/testtools/trunk/annotate/head:/testtools/tests/test_content_type.py http://bazaar.launchpad.net/~testtools-dev/testtools/trunk/annotate/head:/testtools/tests/test_content.py Cheers, Rob -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: