[issue21081] missing vietnamese codec TCVN 5712:1993 in Python

Antti Haapala report at bugs.python.org
Fri Oct 21 16:03:23 EDT 2016


Antti Haapala added the comment:

I found the full document on SlideShare: http://www.slideshare.net/sacobat/tcvn-5712-1993-cng-ngh-thng-tin-b-m-chun-8bit-k-t-vit-dng-trong-trao-i-thng-tin 

As far as I can understand, they're "subsets" of each other only in the sense that VN1 has the widest mapping of characters, but this also partially overlaps with C0 and C1 ranges of control characters in ISO code pages - there are 139 additional characters!

VN2 then lets the C0 and C1 retain the meanings of ISO-8859 by sacrificing some capital vowels (Ezio perhaps remembers that Italy is Ý in Vietnamese - sorry, can't write it in upper case in VN2). VN3 then sacrifices even more for some more spaces left for possibly application-specific uses (the standard is very vague about that); 

The text of the standard is copy-pasteable at http://luatvn.net/tieu-chuan-viet-nam/tieu-chuan-viet-nam-tcvn5712_1993.2.171673.html - however, it lacks some of the tables.

The standard additionally has both UCS-2 mappings and Unicode names of the characters, but they're in pictures; so it would be preferable to get the mapping from the iconv output, say.

----------
nosy: +ztane

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue21081>
_______________________________________


More information about the Python-bugs-list mailing list