Newbie question about text encoding

Chris Angelico rosuav at gmail.com
Tue Mar 3 23:54:57 EST 2015


On Wed, Mar 4, 2015 at 3:45 PM, Rustom Mody <rustompmody at gmail.com> wrote:
>
> It lists some examples of software that somehow break/goof going from BMP-only
> unicode to 7.0 unicode.
>
> IOW the suggestion is that the the two-way classification
> - ASCII
> - Unicode
>
> is less useful and accurate than the 3-way
>
> - ASCII
> - BMP
> - Unicode

How is that more useful? Aside from storage optimizations (in which
the significant breaks would be Latin-1, UCS-2, and UCS-4), the BMP is
not significantly different from the rest of Unicode.

Also, the expansion from 16-bit was back in Unicode 2.0, not 7.0. Why
do you keep talking about 7.0 as if it's a recent change?

ChrisA



More information about the Python-list mailing list