Another 2 to 3 mail encoding problem

Peter J. Holzer hjp-python at hjp.at
Wed Aug 26 14:06:48 EDT 2020


On 2020-08-26 16:10:35 +0100, Chris Green wrote:
> I'm unearthing a few issues here trying to convert my mail filter and
> delivery programs from 2 to 3!  
> 
> I have a simple[ish] local mbox mail delivery module as follows:-
> 
[...]
>     class mymbox(mailbox.mbox):
>         def _pre_message_hook(self, f):
>             """Don't write the blank line before the 'From '"""
>             pass
[...]
>     def deliverMboxMsg(dest, msg, log):
[...]
>         mbx = mymbox(dest, factory=None)
[...]
>                 mbx.add(msg)
[...]
> 
> 
> It has run faultlessly for many years under Python 2.  I've now
> changed the calling program to Python 3 and while it handles most
> E-Mail OK I have just got the following error:-
> 
>     Traceback (most recent call last):
>       File "/home/chris/.mutt/bin/filter.py", line 102, in <module>
>         mailLib.deliverMboxMsg(dest, msg, log)
>       File "/home/chris/.mutt/bin/mailLib.py", line 52, in deliverMboxMsg
>         mbx.add(msg)
>       File "/usr/lib/python3.8/mailbox.py", line 603, in add
>         self._toc[self._next_key] = self._append_message(message)
>       File "/usr/lib/python3.8/mailbox.py", line 758, in _append_message
>         offsets = self._install_message(message)
>       File "/usr/lib/python3.8/mailbox.py", line 830, in _install_message
>         self._dump_message(message, self._file, self._mangle_from_)
>       File "/usr/lib/python3.8/mailbox.py", line 215, in _dump_message
>         gen.flatten(message)
>       File "/usr/lib/python3.8/email/generator.py", line 116, in flatten
>         self._write(msg)
>       File "/usr/lib/python3.8/email/generator.py", line 181, in _write
>         self._dispatch(msg)
>       File "/usr/lib/python3.8/email/generator.py", line 214, in _dispatch
>         meth(msg)
>       File "/usr/lib/python3.8/email/generator.py", line 432, in _handle_text
>         super(BytesGenerator,self)._handle_text(msg)
>       File "/usr/lib/python3.8/email/generator.py", line 249, in _handle_text
>         self._write_lines(payload)
>       File "/usr/lib/python3.8/email/generator.py", line 155, in _write_lines
>         self.write(line)
>       File "/usr/lib/python3.8/email/generator.py", line 406, in write
>         self._fp.write(s.encode('ascii', 'surrogateescape'))
>     UnicodeEncodeError: 'ascii' codec can't encode character '\ufeff' in position 4: ordinal not in range(128)

The problem is that the message contains a '\ufeff' character (byte
order mark) where email/generator.py expects only ASCII characters.

I see two possible reasons for this:

 * The mbox writing code assumes that all messages with non-ascii
   characters are QP or base64 encoded, and some higher layer uses 8bit
   instead.

 * A mime-part is declared as charset=us-ascii but contains really
   Unicode characters.

Both reasons are weird.

The first would be an unreasonable assumption (8bit encoding has been
common since the mid-1990s), but even if the code made that assumption,
one would expect that other code from the same library honors it.

The second shouldn't be possible: If a message is mis-declared (that
happens) one would expect that the error happens during parsing, not
when trying to serialize the already parsed message. 

But then you haven't shown where msg comes from. How do you parse the
message to get "msg"?

Can you construct a minimal test message which triggers the bug?

        hp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp at hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20200826/cd938d07/attachment.sig>


More information about the Python-list mailing list