[Python-ideas] New imaplib implementation for Python 3.2+ standard library

Thu Jul 28 03:35:24 CEST 2011

On Wed, Jul 27, 2011 at 6:34 PM, Menno Smits <menno at freshfoo.com> wrote:
> (I'm replying via Google Groups because I just joined and don't have
> this thread in my email inbox. It's being a bit flaky so apologies if
> this comes though twice).
>
> On Jul 26, 5:54 pm, Maxim Khitrov <m... at mxcrypt.com> wrote:
>>
>> One of the things I'm trying to address with my library is strict
>> adherence to the current version of the IMAP4 protocol. The other is
>> performance; hence the implementation of extensions such as SASL-IR,
>> IDLE, non-synchronizing literals, multiappend, and compression.
>>
>> On the performance side, if you have an application that's trying do
>> some sort of processing of a 6 GB mailbox with 700,000 messages in it,
>> executing a separate FETCH command for each message will take you a
>> week to finish. If you try to be clever and FETCH 1000 messages at a
>> time, for example, you'll quickly run into a few problems:
>>
>> ...
>> All are interface design problems, which are inherited by IMAPClient.
>>
>> - Max
>>
>> P.S. It is not my intention to discourage the use of IMAPClient in any
>> way. Its existence is a good thing for 99% of the users, because it
>> does address a number of key imaplib issues with just the response
>> parser and a UTF-7 codec. My point is that there are real-world use
>> cases out there that cannot be handled by imaplib or IMAPClient, and
>> for those, I'm offering my library as a more general solution that
>> should satisfy the remaining 1% :)
>
> As the maintainer of IMAPClient, I thought I'd weigh in.
>
> ...
>
> I think imaplib2 is a very capable IMAP client library and the Python
> community could only benefit from having something like it in the
> standard library (on the proviso, as Brett mentions, that the Python
> community supports the library by using it widely).

Thanks for the kind words :)

> Here's a few comments about imaplib2 from my own biased perspective:
>
> It requires too much effort on behalf of the caller. Your example.py
> highlights how datetimes are returned as strings that need to be
> converted to real datetimes and FETCH response keys need to be
> uppercased to ensure consistency. The need to jump through the same
> non-trivial hoops each time I used imaplib was one of the frustrations
> that led to the creation of IMAPClient. Please consider having
> imaplib2
> do a little more work so that every user doesn't have to.

Part of this will be addressed by the higher-level interface that I'm
currently working on. As for imaplib2, there are two reasons why I
decided not to do any sort of automatic normalization of the responses
(with the exception of CAPABILITY):

1. Performance. Not all responses (and parts of a response) are useful
to the caller. There is no point in having the library perform
response-specific normalization just to have the whole thing discarded
as soon as it is returned. Originally, I even played with the idea of
a lazy parser (i.e. parse the response only if some attribute or data
item is accessed), but decided to go for a simpler implementation in
the end.

2. Consistency, expectations, and bugs. The normalization processes
may not do the Right Thing for every single response. Ultimately, only
the caller knows for sure what content to expect from the server,
especially if you are trying to implement some server-specific
commands or a new extension. The library only knows the general syntax
rules. If you start to assume that all returned responses are
normalized, you could run into some unwelcome surprises when that
normalization fails or even corrupts some data for a response type
that wasn't recognized.

So basically, I think that in a low-level library such as this, it
should be the caller's decision whether an INTERNALDATE value is
converted to Unix time (or some other format), or if the FETCH
response keys are changed to upper case. I'm happy to provide
additional utility functions for such conversions, but trying to
handle these things automatically could be a source of many additional
bugs. Think about the separation between zlib and gzip, or binascii
and base64 modules. My library is the low-level interface and I'm
working on something that will be easier to use at the cost of some
control.

> Similarly, UID support could be better. IMAPClient has a boolean
> attribute which lets you select whether you want UIDs to be
> transparently used for future commands. Having to specify whether you
> want UID support enabled on each call is a little clumsy. It's
> unlikely
> that a user of imaplib2 would want to toggle between using UIDs and
> not
> on every call.

I have to disagree with you here. The application that I wrote this
library for does depend on the ability to run UID and regular FETCH
commands in the same connection. I was actually very surprised to see
that IMAPClient requires you pick one or the other at creation time.

In some applications you may need to discover and use the
relationships between SNs and UIDs, or use a command like UID EXPUNGE
(from UIDPLUS extension) and a regular EXPUNGE in the same session. I
think that you do have to let the user make this decision on a
per-command basis.

> This has already been mentioned but imaplib2 won't get accepted into
> the
> stdlib if you don't conform to PEP 8. Those tabs have to go.

I know. I'll reformat everything once all the major coding is done.

> How much testing has imaplib2 seen against real IMAP implementations?
> Throughout IMAPClient's history its users have found many unexpected
> behaviours in various popular IMAP implementations. Those discoveries
> have lead to updates to IMAPClient's code and tests (this is the
> "battle-tested" aspect that Michael refers too). On top of its unit
> tests, IMAPClient has a fairly extensive live test script that can be
> run (destructively) against a real IMAP account. I have test accounts
> with a number of different IMAP implementations which I regularly test
> IMAPClient against. A set of "live" tests is invaluable for testing
> new
> features and avoiding regressions between versions. It would be
> interesting to see what problems you find if you set up something
> similar for imaplib2.

I've tested against Gmail servers, Microsoft Exchange 2007, and ran
simulated tests based on example sessions in various RFCs and other
sources. I also wrote a "shell" script that connects to an IMAP server
and goes into interactive mode, allowing me to run IMAP or Python
commands exactly as you would in an interactive Python session. I'll
try to upload it to the repository in the next few days.

My library does need more testing. Although I tried to follow the
robustness principle (be conservative in what you send; be liberal in
what you accept) when writing the command generator and response
parser, there probably are some bugs remaining, but hopefully not
many.

Which IMAP servers do you test against and how did you go about
getting the test accounts?

> [1] - Are you aware there's already another project with the same
> name?
> http://www.janeelix.com/piers/python/imaplib2.html

Hmm... I probably should have tried searching before using that name.
I'm happy to go with something else, since my library is not in
wide-spread use right now. Would suggesting imaplib3 for stdlib be a
bit confusing? :/

That looks like another improvement of imaplib, which uses threads to
achieve some asynchronous execution. Can't say that I like the
approach, but I do admire the effort. They even got compression
working, but that was at the cost of having to implement readline in
python rather than relying on the BufferedReader. That was one of the
bigger challenges for me as well, but I opted to write my own SocketIO
class for this.

- Max