[issue21081] missing vietnamese codec TCVN 5712:1993 in Python
Antti Haapala
report at bugs.python.org
Fri Oct 21 17:02:40 EDT 2016
Antti Haapala added the comment:
Ah there was something that I overlooked before - the VN1 and VN2 both have combining accents too. If I read correctly, the main letter should precede the combining character, just as in Unicode; VN3 seems to lack combining characters altogether.
Thus, for simple text conversion from VN* to Unicode, VN1 should be enough, but some VN2/VN3 control/application specific codes might show up as accented capital letters.
---
The following script rips the table from iconv:
import subprocess
mapping = subprocess.run('iconv -f TCVN -t UTF-8'.split(),
input=bytes(range(256)),
stdout=subprocess.PIPE).stdout.decode()
There were several aliases but all of them seemed to produce identical output. Output matches the VN1 from the tables.
And the luatvn.net additionally *did* have a copyable VN1 - UCS2 table
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue21081>
_______________________________________
More information about the Python-bugs-list
mailing list