imaplib: is this really so unwieldy?

Thu May 27 12:42:43 EDT 2021

On 5/26/21 12:25 AM, Cameron Simpson wrote:
> On 25May2021 19:21, hw <hw at adminart.net> wrote:
>> On 5/25/21 11:38 AM, Cameron Simpson wrote:
>>> On 25May2021 10:23, hw <hw at adminart.net> wrote:
>>>> if status != 'OK':
>>>>     print('Login failed')
>>>>     exit
>>>
>>> Your "exit" won't do what you want. I expect this code to raise a
>>> NameError exception here (you've not defined "exit"). That _will_ abort
>>> the programme, but in a manner indicating that you're used an unknown
>>> name.  You probably want:
>>>
>>>      sys.exit(1)
>>>
>>> You'll need to import "sys".
>>
>> Oh ok, it seemed to be fine.  Would it be the right way to do it with
>> sys.exit()?  Having to import another library just to end a program
>> might not be ideal.
> 
> To end a programme early, yes. (sys.exit() actually just raises a
> particular exception, BTW.)
> 
> I usually write a distinct main function, so in that case one can just
> "return". After all, what seems an end-of-life circumstance in a
> standalone script like yours is just an "end this function" circumstance
> when viewed as a function, and that also lets you _call_ the main
> programme from some outer thing. Wouldn't want that outer thing
> cancelled, if it exists.
> 
> My usual boilerplate for a module with a main programme looks like this:
> 
>      import sys
>      ......
>      def main(argv):
>          ... main programme, return like any other function ...
>      .... other code for the module - functions, classes etc ...
>      if __name__ == '__main__':
>          sys.exit(main(sys.argv))
> 
> which (a) puts main(0 up the top where it can be seen, (b) makes main()
> an ordinary function like any other (c) lets me just import that module
> elsewhere and (d) no globals - everything's local to main().
> 
> The __name__ boilerplate at the bottom is the magic which figures out if
> the module was imported (__name__ will be the import module name) or
> invoked from the command line like:
> 
>      python -m my_module cmd-line-args...
> 
> in which case __name__ has the special value '__main__'. A historic
> mechanism which you will convince nobody to change.

Thanks, that seems like good advice.

> You'd be surprised how useful it is to make almost any standalone
> programme a module like this - in the medium term it almost always pays
> off for me. Even just the discipline of shoving all the formerly-global
> variables in the main function brings lower-bugs benefits.

What do you do with it when importing it?  Do you somehow design your 
programs as modules in some way that makes them usable as some kind of 
library funktion?

>>> I've done little with IMAP. What's in msgnums here? Eg:
>>>      print(type(msgnums), repr(msgnums))
>>> just so we all know what we're dealing with here.
>>
>> <class 'list'> [b'']
>>
>>>> message_uuids = []
>>>> for number in str(msgnums)[3:-2].split():
>>>
>>> This is very strange. [...]
>> Yes, and I don't understand it.  'print(msgnums)' prints:
>>
>> [b'']
>>
>> when there are no messages and
>>
>> [b'1 2 3 4 5']
> 
> Chris has addressed this. msgnums is list of the data components of the
> IMAP response.  By going str(msgnums) you're not getting "the message
> numbers as text" you're getting what printing a list prints. Which is
> roughly Python code: the brakcets and the repr() of each list member.

Well, of course I exepect to get message numbers returned from such a 
library function, not the raw imap response.  What is the library for 
when I have to figure it all out by myself anyway.

> Notice that the example code accessed msgnums[0] - that is the first
> data component, a bytes. That you _can_ convert to a string (under
> assumptions about the encoding).

And of course, I don't want to randomly convert bytes into strings ...

> By getting the "str" form of a list, you're forced into the weird [3:-2]
> hack to ttrim the ends. But it is just a hack for a transcription
> mistake, not a sane parse.

Right, thats why I don't like it and is part of what makes it so unwieldy.

>> So I was guessing that it might be an array containing a single a
>> string and that refering to the first element of the array turns into
>> a string with which split() can used.  But 'print(msgnums[0].split())'
>> prints
>>
>> [b'1', b'2', b'3', b'4', b'5']
> 
> msgnums[0] is bytes. You can do most str things with bytes (because that
> was found to be often useful) but you get bytes back from those
> operations as you'd hope.

As someone unfamiliar with python, I was wondering what this output 
means.  It could be items of an array that are bytes containing numbers, 
like, in binary, 00000001, 00000010, and so on.  That's more like what I 
would expect and not something I would want to convert into a string.

>> so I can only guess what that's supposed to mean: maybe an array of
>> many bytes?  The documentation[1] clearly says: "The message_set
>> options to commands below is a string [...]"
> 
> But that is the parameter to the _call_: your '(UID)' parameter.

No, it's not a uid.  With library, I haven't found a way to get uids. 
The function to call reqires a string, not bytes or an array of bytes.

The included example contradicts the documentation and leaves potential 
users of the library to guessing.

>> I also need to work with message uids rather than message numbers
>> because the numbers can easily change.  There doesn't seem to be a way
>> to do that with this library in python.
> 
> By asking for UIDs you're getting uids. Do they not work in subsequent
> calls?

How do you force the library to give uids instead of message numbers, 
and how do you make it do something with a particular message by uid?

>> So it's all guesswork, and I gave up after a while and programmed what
>> I wanted in perl.  The documentation of this library sucks, and there
>> are worlds between it and the documentation for the libraries I used
>> with perl.
> 
> I think you're better of looking for another Python imap library. The
> imaplib was basic functionality to (a) access the rpotocol in basic form
> and (b) conceal the async stuff, since IMAP is an asynchronous protocol.
> 
> You can in fact subclass it to do better things. Other library might do
> thatm or they might have written their own protocol implementations.

I haven't really found a better one.

>> That doesn't mean I don't want to understand why this is so unwieldy.
>> It's all nice and smooth in perl.
> 
> But using what library? Something out of CPAN? Those are third party
> libraries, not Perl's presupplied stuff.
The libraries I used with perl all come with Fedora, so they are 
presupplied and only need to be installed.  You can find them on cpan as 
well.  It's very convenient.

> The equivalent for Python is
> pypi.org. Look there.

Yes, someone pointed it out, and I did.  They have an imap-tools library 
which looks most promising at first, and its documentation tells me that 
you do can some IMAP operations with messages --- and nothing with 
messages themselves, like getting from the server.

So it seems that IMAP support through python is virtually non-existent.

>>> It is just breaking apart data[0] into strings which were separated by
>>> whitespace in the response. And then using those same strings as keys
>>> for the .fecth() call. That doesn't seem complex, and in fact is blind
>>> to the format of the "message numbers" returned. It just takes what it
>>> is handed and uses those to fetch each message.
>>
>> That's not what the documentation says.
> 
> The _example code_ is blind to them, whatever the semantics of the docs.
> It just gets the uids and fetches with them. Aside from .split(),
> there's no parsing or deep understanding of the uid.

There is no telling what might happen to the messages when using 
functions that randomly convert bytes in weird ways and that contradict 
the documentation.  Why would I trust this library not to accidentially 
delete all my emails or to not do other harmful things?

>>> IMAP's quite complex. Have you read RFC2060?
>>>
>>>      https://datatracker.ietf.org/doc/html/rfc2060.html
>>
>> Yes, I referred to it and it didn't become any more clear in
>> combination with the documentation of the python library.
> 
> IMAP's like that :-)

IMAP seems fine :)

>>> The imaplib library is probably a fairly basic wrapper for the
>>> underlying protocol which provides methods for the basic client requests
>>> and conceals the asynchronicity from the user for ease of (basic) use.
>>
>> Skip Montanaro seems to say that the byte problem comes from the
>> change from python 2 to 3 and there is a better library now:
>> https://pypi.org/project/IMAPClient/
> 
> And someone mentioned imaplib2. There are several choices.

Does that exist?  pypi.org says the project description is unknown, so I 
didn't look closer. ... Hm, ok there's git repo that seems to be alive 
and it even has some documentation.

>> But the documentation seems even more sparse than the one for imaplib.
>> Is it a general thing with python that libraries are not well
>> documented?
> 
> That depends on the library - it is of course at the whim of the
> developer. Heavily used powerful libraries are usually well documented,
> eg numpy. A random hacker's published module? Might have nothing.

Well, gajim seems to be using an xmpp library for which there is no 
documentation ...  Other than that, is there a documented xmpp library 
for python which is useful?

In particular, I'd like to have an xmpp library that implements HTTP 
file upload because that's missing from the perl libraries, and I 
haven't been able to figure out how to implement it myself.

> Wrappers for protocols like IMAP might have a bit of doco and expect the
> useful to infer stuff from knowledge of the protocol itself.

The perl libraries do that.

> [...]
>>> Anyway, the IMAP response are bytes containing text. You get a lot of
>>> bytes.
>>
>> Well, ok, but it's not helpful that b is being inserted like
>> everywhere,
> 
> Only when you _print_ them. That "b" is an idicator that this is a bytes
> object being printed in a stringlike form because that is often a useful
> representation. Nothings inserting a "b" in the data itself.

I was just trying to do something to get the return of one function of 
the library usable as the input for another function of the same 
library, and that involved random conversations of bytes into strings 
(which may work in one case and can go wrong at any time) ...  I can't 
help thinking that a library which is incompatible with itself isn't 
really a good library ...

>> and I have to keep asking myself what I'm looking at because bytes are
>> bytes.
> 
> If you've got b'abc', that is a printout of a 3 byte string. _Not_ the
> string itself.

Who says it's a string?

RFC 3501 is quite specific in section 2.2: "All interactions transmitted 
by client and server are in the form of lines, that is, strings that end 
with a CRLF.", and it describes what strings are in section 4.3.

I would say a library that gives me responses from an IMAP server as 
bytes --- which could be anything --- rather than strings is not giving 
me what I can expect and probably does something wrong.

Of course, I could read the source code of the library, but that pretty 
much defeats the point of using a library.

>> Since the documentation is so bad, I had to figure it out by trial and
>> error and by printing stuff and making guesses and assumptions.
>> That's just not the way to program something.
> 
> No. But _many_ modules are what the original author needed to get
> something done, and neither complete nor perfectly documented. Life's
> too short. Well used module tend to become more complete, elegant and
> documented over time, _if_ people other than the author use them.

Right.  Fortunately, perl offers good libraries for the purpose, so I 
used those instead.

>>> When you go:
>>>
>>>      text = str(data)
>>>
>>> that is _assuming_ a particular text encoding stored in the data. You
>>> really ought to specify an encoding here. If you've not specified the
>>> CHARSET for things, 'ascii' would be a conservative choice. The IMAP RFC
>>> talks about what to expect in section 4 (Data Formats). There's quite a
>>> lot of possible response formats and I can understand imaplib not
>>> getting deeply into decoding these.
>>
>> UTF8 is the default since quite a while now.  Why doesn't it just use that?
> 
> "since quite a while now" is pretty vague. Look at the date on RFC2060,
> and remember that that is version 4 of the protocol, _after_ a time
> consuming standardisation process. So subtract several years from that.
> 
> Regardless, _assuming_ that arbitrary bytes are UTF8 is error prone and
> foolhardy and will lead to mojibake. If a protocol specifies a UTF8
> encoding, you're good.  Otherwise you need to find out what is actually
> going on. Some protocols are old enough to assume the world is ASCII.
> Some take a less rigid view that they're processing "text" at all, for
> all that their contents are effectively text.
> 
> Notice that the RFC specifies that strings are 7-bit text except in
> certain circumstances. And those strings are probably ASCII.

Fortunately, the perl libraries just gave me uids when I asked for them, 
and I didn't have to worry about anything :)

>> Then its documentation should at least specify what the library does.
> 
> It mostly does, and that doing is relatively basic. Does it say it
> provides high level post parsed data structures? No. It describes what
> it gets back from tehh IMAP responses, and further/deeper parse is up to
> the user. Who is free to subclass the IMAP class and add such parsing.
> Or to find or write a more Pythonic module (==> one with more normal
> Python idoms and working in Python classes like str more than in bytes).
> But the _protocol_ passes bytes data around.

It doesn't really ...

>>> So having passed '(UID)' to the SEARCH request, you now need to parse
>>> the response.
>>
>> First I have to guess what the response might be ...  And once I
>> manged that, there's still no way to do something with a message by
>> its uid.
> 
> You can print the response, then look at the RFC to understand what it
> is and how to parse it.
> 
> But this is grittiness you're not interested in, which means you need to
> look for a higher level library.

Right, and this isn't good for someone who has just begun to learn 
python and has no idea what he does.  Maybe I'll find something better 
suited to learning.