Python 3 how to convert a list of bytes objects to a list of strings?

Chris Green cl at isbd.net
Fri Aug 28 07:26:07 EDT 2020


Cameron Simpson <cs at cskk.id.au> wrote:
> On 28Aug2020 08:56, Chris Green <cl at isbd.net> wrote:
> >Stefan Ram <ram at zedat.fu-berlin.de> wrote:
> >> Chris Angelico <rosuav at gmail.com> writes:
> >> >But this is a really good job for a list comprehension:
> >> >sss = [str(word) for word in bbb]
> >>
> >>   Are you all sure that "str" is really what you all want?
> >>
> >Not absolutely, you no doubt have been following other threads related
> >to this one.  :-)
> 
> It is almost certainly not what you want. You want some flavour of 
> bytes.decode. If the BytesParser doesn't cope, you may need to parse the 
> headers as some kind of text (eg ISO8859-1) until you find a 
> content-transfer-encoding header (which still applies only to the body, 
> not the headers).
> 
> >> |>>> b = b"b"
> >> |>>> str( b )
> >> |"b'b'"
> >>
> >>   Maybe try to /decode/ the bytes?
> >>
> >> |>>> b.decode( "ASCII" )
> >> |'b'
> >>
> >>
> >Therein lies the problem, the incoming byte stream *isn't* ASCII, it's
> >an E-Mail message which may, for example, have UTF-8 or other encoded
> >characters in it.  Hopefully it will have an encoding given in the
> >header but that's only if the sender is 'well behaved', one needs to
> >be able to handle almost anything and it must be done without 'manual'
> >interaction.
> 
> POP3 is presumably handing you bytes containing a message. If the Python 
> email.BytesParser doesn't handle it, stash the raw bytes _elsewhere_ in 
> a distinct file in some directory.
> 
>     with open('evil_msg_bytes', 'wb') as f:
>         for bs in bbb:
>             f.write(bs)
> 
> No interpreation requires, since parsing failed. Then you can start 
> dealing with these exceptions. _Do not_ write unparsable messages into 
> an mbox!
> 
Maybe I shouldn't but Python 2 has been managing to do so for several
years without any issues.  I know I *could* put the exceptions in a
bucket somewhere and deal with them separately but I'd really rather
not.

At prsent (with the Python 2 code still installed) it all 'just works'
and the absolute worst corruption I ever see in an E-Mail is things
like accented characters missing altogether or £ signs coming out as a
funny looking string.  Either of these don't really make the message
unintelligible.

Are we saying that Python 3 really can't be made to handle things
'tolerantly' like Python 2 used to?

-- 
Chris Green
·


More information about the Python-list mailing list