How do I do this in Python 3 (string.join())?

Chris Green cl at isbd.net
Thu Aug 27 09:36:01 EDT 2020


Cameron Simpson <cs at cskk.id.au> wrote:
> On 27Aug2020 09:16, Chris Green <cl at isbd.net> wrote:
> >Cameron Simpson <cs at cskk.id.au> wrote:
> >> But note: joining bytes like strings is uncommon, and may indicate 
> >> that
> >> you should be working in strings to start with. Eg you may want to
> >> convert popmsg from bytes to str and do a str.join anyway. It depends on
> >> exactly what you're dealing with: are you doing text work, or are you
> >> doing "binary data" work?
> >>
> >> I know many network protocols are "bytes-as-text, but that is
> >> accomplished by implying an encoding of the text, eg as ASCII, where
> >> characters all fit in single bytes/octets.
> >>
> >Yes, I realise that making everything a string before I start might be
> >the 'right' way to do things but one is a bit limited by what the mail
> >handling modules in Python provide.
> 
> I do ok, though most of my message processing happens to messages 
> already landed in my "spool" Maildir by getmail. My setup uses getmail 
> to get messages with POP into a single Maildir, and then I process the 
> message files from there.
> 
Most of my mail is delivered by SMTP, I run a Postfix SMTP *serever*
on my desktop machine which stays on permanently.

The POP3 processing is solely to collect E-Mail that ends up in the
'catchall' mailbox on my hosting provider.  It empties the POP3
catchall mailbox, checks for anything that *might* be for me or other
family members then just deletes the rest.

> >E.g. in this case the only (well the only ready made) way to get a
> >POP3 message is using poplib and this just gives you a list of lines
> >made up of "bytes as text" :-
> >
> >    popmsg = pop3.retr(i+1)
> 
> Ok, so you have bytes? You need to know.
> 
The documentation says (and it's exactly the same for Python 2 and
Python 3):-

    POP3.retr(which)
        Retrieve whole message number which, and set its seen flag. Result
        is in form (response, ['line', ...], octets).

Which isn't amazingly explicit unless 'line' implies a string.


> >I join the lines to feed them into mailbox.mbox() to create a mbox I
> >can analyse and also a message which can be sent using SMTP.
> >
> >Should I be converting to string somewhere?
> 
> I have not used poplib, but the Python email modules have a BytesParser, 
> which gets you a Message object; I would feed the poplib bytes to that 
> to parse the received message.  A Message object can then be transcribed 
> as text via its .as_string method. Or you can do other things with it.
> 
> I think my main points are:
> 
> - know whether you're using bytes (uninterpreted data) or text (strings 
>   of _characters_); treating bytes _as_ text implies an encoding, and 
>   when that assumption is incorrect you get mojibake[1]
> 
> - look at the email modules' parsers, which return Messages, a 
>   representation of the message in a structure (so that MIME subparts 
>   etc are correctly broken out, and the character sets are _known_, post 
>   parse)

OK, thanks Cameron.
 
-- 
Chris Green
·


More information about the Python-list mailing list