mailbox misbehavior with non-ASCII

Barry barry at barrys-emacs.org
Sat Jul 30 03:55:24 EDT 2022




> On 30 Jul 2022, at 00:30, Peter Pearson <pkpearson at nowhere.invalid> wrote:
> 
> The following code produces a nonsense result with the input 
> described below:
> 
> import mailbox
> box = mailbox.Maildir("/home/peter/Temp/temp",create=False)
> x = box.values()[0]
> h = x.get("X-DSPAM-Factors")
> print(type(h))
> # <class 'email.header.Header'>
> 
> The output is the desired "str" when the message file contains this:
> 
> To: recipient at example.com
> Message-ID: <123>
> Date: Sun, 24 Jul 2022 15:31:19 +0000
> Subject: Blah blah
> From: from at from.com
> X-DSPAM-Factors: a'b
> 
> xxx
> 
> ... but if the apostrophe in "a'b" is replaced with a
> RIGHT SINGLE QUOTATION MARK, the returned h is of type 
> "email.header.Header", and seems to contain inscrutable garbage.

Include in any bug report the exact bytes that are in the header.
In may not be utf-8 encoded it maybe windows cp1252, etc.
Repr of the bytes header will show this.

Barry

> 
> I realize that one should not put non-ASCII characters in
> message headers, but of course I didn't put it there, it
> just showed up, pretty much beyond my control.  And I realize
> that when software is given input that breaks the rules, one
> cannot expect optimal results, but I'd think an exception
> would be the right answer.
> 
> Is this worth a bug report?
> 
> -- 
> To email me, substitute nowhere->runbox, invalid->com.
> -- 
> https://mail.python.org/mailman/listinfo/python-list



More information about the Python-list mailing list