How do I do this in Python 3 (string.join())?

Cameron Simpson cs at cskk.id.au
Thu Aug 27 05:30:15 EDT 2020


On 27Aug2020 09:16, Chris Green <cl at isbd.net> wrote:
>Cameron Simpson <cs at cskk.id.au> wrote:
>> But note: joining bytes like strings is uncommon, and may indicate 
>> that
>> you should be working in strings to start with. Eg you may want to
>> convert popmsg from bytes to str and do a str.join anyway. It depends on
>> exactly what you're dealing with: are you doing text work, or are you
>> doing "binary data" work?
>>
>> I know many network protocols are "bytes-as-text, but that is
>> accomplished by implying an encoding of the text, eg as ASCII, where
>> characters all fit in single bytes/octets.
>>
>Yes, I realise that making everything a string before I start might be
>the 'right' way to do things but one is a bit limited by what the mail
>handling modules in Python provide.

I do ok, though most of my message processing happens to messages 
already landed in my "spool" Maildir by getmail. My setup uses getmail 
to get messages with POP into a single Maildir, and then I process the 
message files from there.

>E.g. in this case the only (well the only ready made) way to get a
>POP3 message is using poplib and this just gives you a list of lines
>made up of "bytes as text" :-
>
>    popmsg = pop3.retr(i+1)

Ok, so you have bytes? You need to know.

>I join the lines to feed them into mailbox.mbox() to create a mbox I
>can analyse and also a message which can be sent using SMTP.
>
>Should I be converting to string somewhere?

I have not used poplib, but the Python email modules have a BytesParser, 
which gets you a Message object; I would feed the poplib bytes to that 
to parse the received message.  A Message object can then be transcribed 
as text via its .as_string method. Or you can do other things with it.

I think my main points are:

- know whether you're using bytes (uninterpreted data) or text (strings 
  of _characters_); treating bytes _as_ text implies an encoding, and 
  when that assumption is incorrect you get mojibake[1]

- look at the email modules' parsers, which return Messages, a 
  representation of the message in a structure (so that MIME subparts 
  etc are correctly broken out, and the character sets are _known_, post 
  parse)

[1] https://en.wikipedia.org/wiki/Mojibake

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Python-list mailing list