[issue21081] missing vietnamese codec TCVN 5712:1993 in Python

Antti Haapala report at bugs.python.org
Fri Oct 21 17:02:40 EDT 2016


Antti Haapala added the comment:

Ah there was something that I overlooked before - the VN1 and VN2 both have combining accents too. If I read correctly, the main letter should precede the combining character, just as in Unicode; VN3 seems to lack combining characters altogether.

Thus, for simple text conversion from VN* to Unicode, VN1 should be enough, but some VN2/VN3 control/application specific codes might show up as accented capital letters.

---

The following script rips the table from iconv:

    import subprocess
    mapping = subprocess.run('iconv -f TCVN -t UTF-8'.split(), 
                             input=bytes(range(256)), 
                             stdout=subprocess.PIPE).stdout.decode()

There were several aliases but all of them seemed to produce identical output. Output matches the VN1 from the tables.

And the luatvn.net additionally *did* have a copyable VN1 - UCS2 table

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue21081>
_______________________________________


More information about the Python-bugs-list mailing list