[issue21081] missing vietnamese codec TCVN 5712:1993 in Python
Antti Haapala
report at bugs.python.org
Fri Oct 21 16:03:23 EDT 2016
Antti Haapala added the comment:
I found the full document on SlideShare: http://www.slideshare.net/sacobat/tcvn-5712-1993-cng-ngh-thng-tin-b-m-chun-8bit-k-t-vit-dng-trong-trao-i-thng-tin
As far as I can understand, they're "subsets" of each other only in the sense that VN1 has the widest mapping of characters, but this also partially overlaps with C0 and C1 ranges of control characters in ISO code pages - there are 139 additional characters!
VN2 then lets the C0 and C1 retain the meanings of ISO-8859 by sacrificing some capital vowels (Ezio perhaps remembers that Italy is Ý in Vietnamese - sorry, can't write it in upper case in VN2). VN3 then sacrifices even more for some more spaces left for possibly application-specific uses (the standard is very vague about that);
The text of the standard is copy-pasteable at http://luatvn.net/tieu-chuan-viet-nam/tieu-chuan-viet-nam-tcvn5712_1993.2.171673.html - however, it lacks some of the tables.
The standard additionally has both UCS-2 mappings and Unicode names of the characters, but they're in pictures; so it would be preferable to get the mapping from the iconv output, say.
----------
nosy: +ztane
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue21081>
_______________________________________
More information about the Python-bugs-list
mailing list