imaplib: is this really so unwieldy?

Thu May 27 19:21:41 EDT 2021

On 27May2021 18:42, hw <hw at adminart.net> wrote:
>On 5/26/21 12:25 AM, Cameron Simpson wrote:
>>On 25May2021 19:21, hw <hw at adminart.net> wrote:
>>>On 5/25/21 11:38 AM, Cameron Simpson wrote:
>>>>On 25May2021 10:23, hw <hw at adminart.net> wrote:
>>You'd be surprised how useful it is to make almost any standalone
>>programme a module like this - in the medium term it almost always pays
>>off for me. Even just the discipline of shoving all the formerly-global
>>variables in the main function brings lower-bugs benefits.
>
>What do you do with it when importing it?  Do you somehow design your 
>programs as modules in some way that makes them usable as some kind of 
>library funktion?

Michael Torrie addressed this. In short, inspecting __name__ tells you 
if your module was run as a main programme or imported. That's what the 
test for '__main__' checks.

So modules like this tend to be:

    module code
    ....
    if __name__ == '__main__':
        run as a command line main programme

And I personally spell that last line as:

    sys.exit(main(sys.argv))

and define a main() function near the top of the module.

>>Chris has addressed this. msgnums is list of the data components of 
>>the
>>IMAP response.  By going str(msgnums) you're not getting "the message
>>numbers as text" you're getting what printing a list prints. Which is
>>roughly Python code: the brakcets and the repr() of each list member.
>
>Well, of course I exepect to get message numbers returned from such a 
>library function, not the raw imap response.  What is the library for 
>when I have to figure it all out by myself anyway.

The library, for this particular library:
- conceals the async side, giving you plain request=>response function 
  calls
- parses the response in its generic structural form and hands you the 
  components

It's a relatively thin shim for the IMAP protocol itself, not a fully 
fledged mailbox parser.

Since the search function can return many many weird and wonderful 
things depending what you ask it for, this particular library (a) does 
not parse the search field parameter ('(UID)') or the responses. It 
passes them through and 9b) correspondingly does not interpret the 
result beyond unpacking the IMAP response into the data packets 
provided.

What you really want is a "search_uids()" function, which calls 
.search() with '(UID)' _and_ parses the particular style of ressponse 
that '(UID)' produces from a server. imaplib doesn't have that, and that 
is a usability deficiency. It could do with a small suite of 
convenience/helper functions that call the core IMAP methods in 
particular commonly used ways.

>>Notice that the example code accessed msgnums[0] - that is the first
>>data component, a bytes. That you _can_ convert to a string (under
>>assumptions about the encoding).
>
>And of course, I don't want to randomly convert bytes into strings ...

Well, you don't need to. Splitting on "whitespace" is enough - you then 
have a list of individual UID bytes objects, which can probably be 
passed areound opaquely - you don't ordinarily need to _care_ that 
they're bytes transcriptions of the server numeric UID _values_ - you 
can at that point just treat them as tokens.

>>By getting the "str" form of a list, you're forced into the weird [3:-2]
>>hack to ttrim the ends. But it is just a hack for a transcription
>>mistake, not a sane parse.
>
>Right, thats why I don't like it and is part of what makes it so unwieldy.

Bear with me - I'm going to be quite elaborate here.

That _particular_ problem is because you transcribed a 
list-of-bytes-objects as a string (as Python would have printed such a 
thing with the print() function). It's just the wrong thing to do here, 
regardless of the language.

Consider (at the Python interactive ">>> " prompt):

    >>> nums = [1, 2, 3]

A list of ints. print(nums) outputs:

    >>> print(nums)
    [1, 2, 3]

print() works by writing out str() of each of its arguments. So the 
above is a string written to the output. Looking at str():

    >>> str(nums)
    '[1, 2, 3]'

It's a string, with a leading '[' character, a decimal transcription of 
the numeric values with ', ' between those transcriptions, and a 
trailing ']' character.

The single quotes above are the interactive interpreter printing that 
Python str value using repr(), as you would type it to a Python 
programme.

To do what you did with the msgnums I'd go:

    str(nums)[1:-1].split(', ')

which would get me a list of str values. But that cumbersomeness is 
because of the str(nums), which turned nums into a string transcription 
of the list of values. Let's do the equivalent with a made up IMAP 
search response:

    msgnums = [b'123 456 789']

The above, I gather, is the source of your [3:-2] thing: trim the "b['" 
and "']" markers.

I'm using msgnums here because you named it that way, but the response 
from a search() is actually a list of data components. For 
search('(UID)') that list one has one element, the bytes data component 
with UIDs transcribed within it.

This structured response is what you were not expecting. Any IMAP 
response, syntacticly, contains multiple data parts. Depending on the 
call those parts might contain UIDs of messages or parts of messages, 
etc.

So...

Your particular .search() returns _one_ response. But the library is 
handing you a generic form - they _could_ be multiple responses. You did 
a search for '(UID)', so you get one response with these UIDs:

    123 456 789

but it is a bytes object (I'll get to the IMAP standard's "string" term 
versus a Python str later. They're not synonyms.

So the bytes with the UIDs is msgnums[0] in the search) result. You went 
str(msgnums), which transibes a list as you would write it in Python 
code. You actually want (a) msgnums[0], which is the bytes containing 
the UIDs and (b) to parse those bytes into UIDs for use later.

So:

    uids_bs = msgnums[0]    # get the data part you want
    uids = uids_bs.split()  # split it on whitespace, pretending it is like ASCII

That works because the IMAP RFC specifies that the result of a '(UID)' 
search is an IMAP string containing space separated UIDs.

At this point "uids" is a list of bytes objects, like this:

    b['123', 'b'456', b'789']

I'm faily sure you don't need to care that these are representations of 
numbers at this point - you could pass them around as tokens. Let me 
read from the imaplib module doco:

    IMAP4 Objects

    All IMAP4rev1 commands are represented by methods of the same name, 
    either upper-case or lower-case.

    All arguments to commands are converted to strings, [...]

That means you _could_ pass in bytes later - treat the UIDs bytes 
objects as tokens. _Or_ you could convert them to ints:

    uid_bses = msgsnums[0].split()
    uids = [ int(str(uid_bs)) for uid_bs in uid_bses ]

Courtesy of the "All arguments to commands are converted to strings" 
above, probably you can use them as ints in subsequent calls and imaplib 
will turn them into strs for the protocol, likely using str().

>>>So I was guessing that it might be an array containing a single a
>>>string and that refering to the first element of the array turns into
>>>a string with which split() can used.  But 'print(msgnums[0].split())'
>>>prints
>>>
>>>[b'1', b'2', b'3', b'4', b'5']

Yes. but that's _printed_. It is just an array of bytes values:

    b'1'
    b'2'
    b'3'
    b'4'
    b'5'

and _those_ are just what they're be as _printed_. They're just single 
byte values

>As someone unfamiliar with python, I was wondering what this output 
>means.  It could be items of an array that are bytes containing 
>numbers, like, in binary, 00000001, 00000010, and so on.  That's more 
>like what I would expect and not something I would want to convert into 
>a string.

Yeah, this is just transcription, as you would need to express these 
values in a Python programme. The values themselves _are_ what you think 
above, but more precisely:

    0b00110001
    0b00110010
    0b00110011
    0b00110100
    0b00110101

("0b" is the Python numeric prefix indicating binary notation.) because 
these bytes are still "text": the decimal digits '1' and so forth in 
ASCII.  Because IMAP itself returns you text (well, bytes containing 
text, text containing decimal transcriptions of UID numeric values).

If you want the numeric values you want this transformation of the 
.search() response:

    [0]         get the first (and only) data bytes response

    .split()    treat it like ASCII and break it up on spaces,
                producing a list of bytes objects, one per value

    [ int(str(uid_bs)) for uid_bs in uid_bses ]
                this is probably optional - the bytes objects above are 
                probably useable directly later:
                for each bytes object, convert it to str and convert 
                that str to an int
                That would get you numeric values from decimal bytes 
                transcriptions. I still suspect you needn't bother with 
                this with the imaplib module.

After those, you have a list of UID-as-bytes or UID-as-int depending on 
whether you did step 3 above.

>>>so I can only guess what that's supposed to mean: maybe an array of
>>>many bytes?  The documentation[1] clearly says: "The message_set
>>>options to commands below is a string [...]"
>>
>>But that is the parameter to the _call_: your '(UID)' parameter.
>
>No, it's not a uid.  With library, I haven't found a way to get uids. 

I describe this in detail above. I agree a .search_uids() convenience 
method would be nice, but it's a 3 line function you could write 
yourself.

>The function to call reqires a string, not bytes or an array of bytes.

No. The library docs say it converts arguments _to_ strings for you. You 
could (probably) pass it ints, strs or bytes for UIDs and all would 
work.

>The included example contradicts the documentation and leaves 
>potential users of the library to guessing.

No, you're missing the conversion facility documented right at the top 
of the IMAP object specs.

>>>I also need to work with message uids rather than message numbers
>>>because the numbers can easily change.  There doesn't seem to be a way
>>>to do that with this library in python.
>>
>>By asking for UIDs you're getting uids. Do they not work in subsequent
>>calls?
>
>How do you force the library to give uids instead of message numbers, 
>and how do you make it do something with a particular message by uid?

You force the _protocol_ to give you UIDs by passing '(UID)' to the 
seach function: that tells the IMAP _server_ to return UIDs as the 
search result.

You can then pass the UID you receive to other functions which will 
accept a UID. I seem to recall that the IMAP protocol specifies that UID 
numbers have a distinct numeric range from message numbers, so probably 
the server can figure out if it was given a message number or a UID 
number on its own without more context. So you would not need to specify 
that you're using message number of UIDs explicitly.

>>You can in fact subclass it to do better things. Other library might 
>>do
>>thatm or they might have written their own protocol implementations.
>
>I haven't really found a better one.

That's a shame. Ther'es lots of Python stuff using IMAP out there. For 
example the getmail, imapsync and mbsync tools all speak IMAP using 
Python. Maybe look at them.

>>>That doesn't mean I don't want to understand why this is so unwieldy.
>>>It's all nice and smooth in perl.
>>
>>But using what library? Something out of CPAN? Those are third party
>>libraries, not Perl's presupplied stuff.
>The libraries I used with perl all come with Fedora, so they are 
>presupplied and only need to be installed.  You can find them on cpan 
>as well.  It's very convenient.

Here you are confused. Fedora is a Linux distro, and supplies many third 
party CPAN modules _as_ perl-this perl-that packages. You will probably 
find it _also_ supplies many PyPI modules as python-this python-that 
etc.

They're _still_ third party packages not part of the core Perl or Python 
languages themselves. Fedora's doing a lot of work here for you.

>>The equivalent for Python is
>>pypi.org. Look there.
>
>Yes, someone pointed it out, and I did.  They have an imap-tools 
>library which looks most promising at first, and its documentation 
>tells me that you do can some IMAP operations with messages --- and 
>nothing with messages themselves, like getting from the server.
>
>So it seems that IMAP support through python is virtually non-existent.

This still sureprises me, but I've not tried to use IMAP seriously. I 
read email locally, and collect it with POP instead. With a tool I wrote 
myself in Python, as it happens.

[...]
>There is no telling what might happen to the messages when using 
>functions that randomly convert bytes in weird ways and that contradict 
>the documentation.  Why would I trust this library not to accidentially 
>delete all my emails or to not do other harmful things?

Because you're not calling the .delete() method? Just guessing here.

[...]
>>>and I have to keep asking myself what I'm looking at because bytes 
>>>are
>>>bytes.
>>
>>If you've got b'abc', that is a printout of a 3 byte string. _Not_ the
>>string itself.
>
>Who says it's a string?
>
>RFC 3501 is quite specific in section 2.2: "All interactions 
>transmitted by client and server are in the form of lines, that is, 
>strings that end with a CRLF.", and it describes what strings are in 
>section 4.3.

Yeah, but read it again. When the IMAP RFC says strings, it means a 
sequence of octets suitably delineated. It DOES NOT say the strings are 
_text_. They come in 2 forms: a "quoted string" delineated by ASCII 
double quote bytes and a run length encoded form where the length of the 
string of octets is specified by a leading {length} ASCII length 
indicator.

They are NOT TEXT STRINGS.

Many of them _contain_ text.

>I would say a library that gives me responses from an IMAP server as 
>bytes --- which could be anything --- rather than strings is not 
>giving me what I can expect and probably does something wrong.

It isn't giving you what you expect.. But by _not_ deeply parsing the 
octet strings returned from the server, it AVOIDS doing something wrong 
- it leaves those accidents to you.

Bytes are bytes, but that means they're meaningless without context.  
IMAP "strings" are strings of octets - arbitrary binary data. The octet 
strings for particular responses have defined transcriptions of things 
in them.

So when you call .search() you get back a single octet string, and 
Python represents such a string as a Python bytes object, which is just 
a string of octets.

You only get to decode that as a space seperated ASCII transcription of 
decimal transcribed UUID numeric values _because_ you passed '(UID)' to 
the search() call. If you'd passed something else the data response 
would contain differently shaped binary data requiring different 
decoding.

The imaplib library does not parse the '(UID)' (or whatever) search 
parameter itself and perform a matching response decode - the 
possibilities are quite complex.

>Right.  Fortunately, perl offers good libraries for the purpose, so I 
>used those instead.

And that's perfectly reasonable.

>Fortunately, the perl libraries just gave me uids when I asked for 
>them, and I didn't have to worry about anything :)

That is convenient. Good to hear.

>Right, and this isn't good for someone who has just begun to learn 
>python and has no idea what he does.  Maybe I'll find something better 
>suited to learning.

Yes, I think so. The fairly basic imaplib module combined with the quite 
complex IMAP protocol is a poor starting point - there's just too much 
going on to use it as a starting-to-learn-Python scenario.

Oh, when you go to xmpp etc, look at the Python "requests" module (PyPI 
again). Python has a urllib module shipped, but it again is fairly 
basic. The 'requests" module is almost universally what people use to do 
HTTP stuff. "pip install requests" and off you go.

Cheers,
Cameron Simpson <cs at cskk.id.au>