Python 3 how to convert a list of bytes objects to a list of strings?

Cameron Simpson cs at cskk.id.au
Fri Aug 28 16:39:54 EDT 2020


On 28Aug2020 12:26, Chris Green <cl at isbd.net> wrote:
>Cameron Simpson <cs at cskk.id.au> wrote:
>> POP3 is presumably handing you bytes containing a message. If the 
>> Python
>> email.BytesParser doesn't handle it, stash the raw bytes _elsewhere_ in
>> a distinct file in some directory.
>>
>>     with open('evil_msg_bytes', 'wb') as f:
>>         for bs in bbb:
>>             f.write(bs)
>>
>> No interpreation requires, since parsing failed. Then you can start
>> dealing with these exceptions. _Do not_ write unparsable messages into
>> an mbox!
>>
>Maybe I shouldn't but Python 2 has been managing to do so for several
>years without any issues.  I know I *could* put the exceptions in a
>bucket somewhere and deal with them separately but I'd really rather
>not.
>
>At prsent (with the Python 2 code still installed) it all 'just works'
>and the absolute worst corruption I ever see in an E-Mail is things
>like accented characters missing altogether or £ signs coming out as a
>funny looking string.  Either of these don't really make the message
>unintelligible.
>
>Are we saying that Python 3 really can't be made to handle things
>'tolerantly' like Python 2 used to?

It can, but if you're decoding bytes to strings without the correct 
encoding then rubbish will be happening. In Python 2 also, it just isn't 
being flagged.

One approach would be to break your parser process up:

- collect bytes from POP3
- parse headers for filtering purposes
- those you keep, append to the mbox _as bytes_

It sounds like your filter is uninterested in the message body, so you 
don't need to decode it at all. Just ensure the bod has no embedded 
lines with b'From ' at the start, and ensure the last line ends in a 
newline b'\n' or that you append one, so that the b'From ' of the next 
message is recognised.

So: collect bytes, decode ehaders and parse/filter, save _bytes_.

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Python-list mailing list