imaplib: is this really so unwieldy?

Tue May 25 18:25:49 EDT 2021

On 25May2021 19:21, hw <hw at adminart.net> wrote:
>On 5/25/21 11:38 AM, Cameron Simpson wrote:
>>On 25May2021 10:23, hw <hw at adminart.net> wrote:
>>>if status != 'OK':
>>>    print('Login failed')
>>>    exit
>>
>>Your "exit" won't do what you want. I expect this code to raise a
>>NameError exception here (you've not defined "exit"). That _will_ abort
>>the programme, but in a manner indicating that you're used an unknown
>>name.  You probably want:
>>
>>     sys.exit(1)
>>
>>You'll need to import "sys".
>
>Oh ok, it seemed to be fine.  Would it be the right way to do it with 
>sys.exit()?  Having to import another library just to end a program 
>might not be ideal.

To end a programme early, yes. (sys.exit() actually just raises a 
particular exception, BTW.)

I usually write a distinct main function, so in that case one can just 
"return". After all, what seems an end-of-life circumstance in a 
standalone script like yours is just an "end this function" circumstance 
when viewed as a function, and that also lets you _call_ the main 
programme from some outer thing. Wouldn't want that outer thing 
cancelled, if it exists.

My usual boilerplate for a module with a main programme looks like this:

    import sys
    ......
    def main(argv):
        ... main programme, return like any other function ...
    .... other code for the module - functions, classes etc ...
    if __name__ == '__main__':
        sys.exit(main(sys.argv))

which (a) puts main(0 up the top where it can be seen, (b) makes main() 
an ordinary function like any other (c) lets me just import that module 
elsewhere and (d) no globals - everything's local to main().

The __name__ boilerplate at the bottom is the magic which figures out if 
the module was imported (__name__ will be the import module name) or 
invoked from the command line like:

    python -m my_module cmd-line-args...

in which case __name__ has the special value '__main__'. A historic 
mechanism which you will convince nobody to change.

You'd be surprised how useful it is to make almost any standalone 
programme a module like this - in the medium term it almost always pays 
off for me. Even just the discipline of shoving all the formerly-global 
variables in the main function brings lower-bugs benefits.

>>I've done little with IMAP. What's in msgnums here? Eg:
>>     print(type(msgnums), repr(msgnums))
>>just so we all know what we're dealing with here.
>
><class 'list'> [b'']
>
>>>message_uuids = []
>>>for number in str(msgnums)[3:-2].split():
>>
>>This is very strange. [...]
>Yes, and I don't understand it.  'print(msgnums)' prints:
>
>[b'']
>
>when there are no messages and
>
>[b'1 2 3 4 5']

Chris has addressed this. msgnums is list of the data components of the 
IMAP response.  By going str(msgnums) you're not getting "the message 
numbers as text" you're getting what printing a list prints. Which is 
roughly Python code: the brakcets and the repr() of each list member.

Notice that the example code accessed msgnums[0] - that is the first 
data component, a bytes. That you _can_ convert to a string (under 
assumptions about the encoding).

By getting the "str" form of a list, you're forced into the weird [3:-2] 
hack to ttrim the ends. But it is just a hack for a transcription 
mistake, not a sane parse.

>So I was guessing that it might be an array containing a single a 
>string and that refering to the first element of the array turns into 
>a string with which split() can used.  But 'print(msgnums[0].split())' 
>prints
>
>[b'1', b'2', b'3', b'4', b'5']

msgnums[0] is bytes. You can do most str things with bytes (because that 
was found to be often useful) but you get bytes back from those 
operations as you'd hope.

>so I can only guess what that's supposed to mean: maybe an array of 
>many bytes?  The documentation[1] clearly says: "The message_set 
>options to commands below is a string [...]"

But that is the parameter to the _call_: your '(UID)' parameter.

>I also need to work with message uids rather than message numbers 
>because the numbers can easily change.  There doesn't seem to be a way 
>to do that with this library in python.

By asking for UIDs you're getting uids. Do they not work in subsequent 
calls?

>So it's all guesswork, and I gave up after a while and programmed what 
>I wanted in perl.  The documentation of this library sucks, and there 
>are worlds between it and the documentation for the libraries I used 
>with perl.

I think you're better of looking for another Python imap library. The 
imaplib was basic functionality to (a) access the rpotocol in basic form 
and (b) conceal the async stuff, since IMAP is an asynchronous protocol.

You can in fact subclass it to do better things. Other library might do 
thatm or they might have written their own protocol implementations.

>That doesn't mean I don't want to understand why this is so unwieldy. 
>It's all nice and smooth in perl.

But using what library? Something out of CPAN? Those are third party 
libraries, not Perl's presupplied stuff. The equivalent for Python is 
pypi.org. Look there.

>>It is just breaking apart data[0] into strings which were separated by
>>whitespace in the response. And then using those same strings as keys
>>for the .fecth() call. That doesn't seem complex, and in fact is blind
>>to the format of the "message numbers" returned. It just takes what it
>>is handed and uses those to fetch each message.
>
>That's not what the documentation says.

The _example code_ is blind to them, whatever the semantics of the docs.  
It just gets the uids and fetches with them. Aside from .split(), 
there's no parsing or deep understanding of the uid.

>>IMAP's quite complex. Have you read RFC2060?
>>
>>     https://datatracker.ietf.org/doc/html/rfc2060.html
>
>Yes, I referred to it and it didn't become any more clear in 
>combination with the documentation of the python library.

IMAP's like that :-)

>>The imaplib library is probably a fairly basic wrapper for the
>>underlying protocol which provides methods for the basic client requests
>>and conceals the asynchronicity from the user for ease of (basic) use.
>
>Skip Montanaro seems to say that the byte problem comes from the 
>change from python 2 to 3 and there is a better library now: 
>https://pypi.org/project/IMAPClient/

And someone mentioned imaplib2. There are several choices.

>But the documentation seems even more sparse than the one for imaplib. 
>Is it a general thing with python that libraries are not well 
>documented?

That depends on the library - it is of course at the whim of the 
developer. Heavily used powerful libraries are usually well documented, 
eg numpy. A random hacker's published module? Might have nothing.

Wrappers for protocols like IMAP might have a bit of doco and expect the 
useful to infer stuff from knowledge of the protocol itself.

[...]
>>Anyway, the IMAP response are bytes containing text. You get a lot of
>>bytes.
>
>Well, ok, but it's not helpful that b is being inserted like 
>everywhere,

Only when you _print_ them. That "b" is an idicator that this is a bytes 
object being printed in a stringlike form because that is often a useful 
representation. Nothings inserting a "b" in the data itself.

>and I have to keep asking myself what I'm looking at because bytes are 
>bytes.

If you've got b'abc', that is a printout of a 3 byte string. _Not_ the 
string itself.

>Since the documentation is so bad, I had to figure it out by trial and 
>error and by printing stuff and making guesses and assumptions.  
>That's just not the way to program something.

No. But _many_ modules are what the original author needed to get 
something done, and neither complete nor perfectly documented. Life's 
too short. Well used module tend to become more complete, elegant and 
documented over time, _if_ people other than the author use them.

>>When you go:
>>
>>     text = str(data)
>>
>>that is _assuming_ a particular text encoding stored in the data. You
>>really ought to specify an encoding here. If you've not specified the
>>CHARSET for things, 'ascii' would be a conservative choice. The IMAP RFC
>>talks about what to expect in section 4 (Data Formats). There's quite a
>>lot of possible response formats and I can understand imaplib not
>>getting deeply into decoding these.
>
>UTF8 is the default since quite a while now.  Why doesn't it just use that?

"since quite a while now" is pretty vague. Look at the date on RFC2060, 
and remember that that is version 4 of the protocol, _after_ a time 
consuming standardisation process. So subtract several years from that.

Regardless, _assuming_ that arbitrary bytes are UTF8 is error prone and 
foolhardy and will lead to mojibake. If a protocol specifies a UTF8 
encoding, you're good.  Otherwise you need to find out what is actually 
going on. Some protocols are old enough to assume the world is ASCII.  
Some take a less rigid view that they're processing "text" at all, for 
all that their contents are effectively text.

Notice that the RFC specifies that strings are 7-bit text except in 
certain circumstances. And those strings are probably ASCII.

>Then its documentation should at least specify what the library does.

It mostly does, and that doing is relatively basic. Does it say it 
provides high level post parsed data structures? No. It describes what 
it gets back from tehh IMAP responses, and further/deeper parse is up to 
the user. Who is free to subclass the IMAP class and add such parsing.  
Or to find or write a more Pythonic module (==> one with more normal 
Python idoms and working in Python classes like str more than in bytes).  
But the _protocol_ passes bytes data around.

>>So having passed '(UID)' to the SEARCH request, you now need to parse
>>the response.
>
>First I have to guess what the response might be ...  And once I 
>manged that, there's still no way to do something with a message by 
>its uid.

You can print the response, then look at the RFC to understand what it 
is and how to parse it.

But this is grittiness you're not interested in, which means you need to 
look for a higher level library.

Cheers,
Cameron Simpson <cs at cskk.id.au>