[issue7856] cannot decode from or encode to big5 \xf9\xd8

Tue Mar 9 10:25:26 EST 2021

Max Bolingbroke <batterseapower at hotmail.com> added the comment:

As of Python 3.7.9 this also affects \xf9\xd6 which should be \u7881 in Unicode. This character is the second character of 宏碁 which is the name of the Taiwanese electronics manufacturer Acer.

You can work around the issue using big5hkscs just like with the original \xf9\xd8 problem.

It looks like the F9D6–F9FE characters all come from the Big5-ETen extension (https://en.wikipedia.org/wiki/Big5#ETEN_extensions, https://moztw.org/docs/big5/table/eten.txt) which is so popular that it is a defacto standard. Big5-2003 (mentioned in a comment below) seems to be an extension of Big5-ETen. For what it's worth, whatwg includes these mappings in their own big5 reference tables: https://encoding.spec.whatwg.org/big5.html. 

Unfortunately Big5 is still in common use in Taiwan. It's pretty funny that Python fails to decode Big5 documents containing the name of one of Taiwan's largest multinationals :-)

----------
nosy: +batterseapower

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue7856>
_______________________________________