Python 3 how to convert a list of bytes objects to a list of strings?

Richard Damon Richard at Damon-Family.org
Sat Aug 29 13:37:33 EDT 2020


On 8/29/20 11:50 AM, Chris Green wrote:
> Chris Green <cl at isbd.net> wrote:
>> Dennis Lee Bieber <wlfraed at ix.netcom.com> wrote:
>>> On Fri, 28 Aug 2020 12:26:07 +0100, Chris Green <cl at isbd.net> declaimed the
>>> following:
>>>
>>>
>>>
>>>> Maybe I shouldn't but Python 2 has been managing to do so for several
>>>> years without any issues.  I know I *could* put the exceptions in a
>>>> bucket somewhere and deal with them separately but I'd really rather
>>>> not.
>>>>
>>>         In Python2 "string" IS BYTE-STRING. It is never UNICODE, and ignores
>>> any encoding.
>>>
>>>         So, for Python3, the SAME processing requires NOT USING "string" (which
>>> is now Unicode) and ensuring that all literals are b"stuff", and using the
>>> methods of the bytes data type.
>>>
>> Now I'm beginning to realise that *this* may well be what I need to
>> do, after going round in several convoluted circles! :-)
>>
> However the problem appears to be that internally in Python 3 mailbox
> class there is an assumption that it's being given 'ascii'.  Here's
> the error (and I'm doing no processing of the message at all):-
>
>     Traceback (most recent call last):
>       File "/home/chris/.mutt/bin/filter.py", line 102, in <module>
>         mailLib.deliverMboxMsg(dest, msg, log)
>       File "/home/chris/.mutt/bin/mailLib.py", line 52, in deliverMboxMsg
>         mbx.add(msg)
>       File "/usr/lib/python3.8/mailbox.py", line 603, in add
>         self._toc[self._next_key] = self._append_message(message)
>       File "/usr/lib/python3.8/mailbox.py", line 758, in _append_message
>         offsets = self._install_message(message)
>       File "/usr/lib/python3.8/mailbox.py", line 830, in _install_message
>         self._dump_message(message, self._file, self._mangle_from_)
>       File "/usr/lib/python3.8/mailbox.py", line 215, in _dump_message
>         gen.flatten(message)
>       File "/usr/lib/python3.8/email/generator.py", line 116, in flatten
>         self._write(msg)
>       File "/usr/lib/python3.8/email/generator.py", line 181, in _write
>         self._dispatch(msg)
>       File "/usr/lib/python3.8/email/generator.py", line 214, in _dispatch
>         meth(msg)
>       File "/usr/lib/python3.8/email/generator.py", line 432, in
>     _handle_text
>         super(BytesGenerator,self)._handle_text(msg)
>       File "/usr/lib/python3.8/email/generator.py", line 249, in
>     _handle_text
>         self._write_lines(payload)
>       File "/usr/lib/python3.8/email/generator.py", line 155, in
>     _write_lines
>         self.write(line)
>       File "/usr/lib/python3.8/email/generator.py", line 406, in write
>         self._fp.write(s.encode('ascii', 'surrogateescape'))
>     UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
>
> Any message with other than ASCII in it is going to have bytes >128
> unless it's encoded some way to make it 7-bit and that's not going to
> happen in the general case.
>
When I took a quick look at the mailbox class, it said it could take a
'string', or a 'message'. It may well be that the string option assumes
ASCII. You may need to use the message parsing options of message to
convert messages with extended characters into the right format. This is
one of the cases where Python 2's non-strictness made things easier, but
also much easier to get wrong if not careful. Python 3 is basically
making you do more work to make sure you are doing it right.

-- 
Richard Damon



More information about the Python-list mailing list