[issue44560] Unrecognized charset "eucgb2312_cn" in email header for many MUA

Tue Jul 6 12:47:15 EDT 2021

R. David Murray <rdmurray at bitdance.com> added the comment:

I can't tell tell for sure if this behavior is intentional or not from a quick glance at the code (though like you I wouldn't think it would be).

That's part of the legacy api, at this point.  The new api will just use utf8:

from email.message import EmailMessage

m = EmailMessage()
m['Subject'] = '中文'

print(bytes(m))

results in

b'Subject: =?utf-8?b?5Lit5paH?=\n\n'

The fix, assuming it is correct, would be to add the line:

    'eucgb2312_cn': 'gb2312',

to the CODEC_MAP in email/charset.py, and then specify the internal codec name in your Charset call.  I'm not sure that's right, though...once upon I time I think I understood the logic behind the charset module, but I no longer remember the details.

I'd recommend just using the new API and not the legacy API.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue44560>
_______________________________________