why isn't Unicode the default encoding?

John Salerno johnjsal at NOSPAMgmail.com
Mon Mar 20 16:50:27 EST 2006


Martin v. Löwis wrote:
> John Salerno wrote:
>> Robert Kern wrote:
>>
>>>   http://www.joelonsoftware.com/articles/Unicode.html
>>
>> That was fascinating. Thank you. So as it turns out, Unicode and UTF-8 
>> are not the same thing? Am I right to say that UTF-8 stores the first 
>> 128 Unicode code points in a single byte, and then stores higher code 
>> points in however many bytes they may need? If so, I guess I had been 
>> mislead by the '8' in the name, thinking that UTF-8 was another way of 
>> storing characters in one byte (which would make it no different than 
>> Latin-1, I suppose).
> 
> That's all correct, except for the last parenthetical remark: using
> a single-byte character set isn't the same as using Latin-1. There
> are various single-byte characters sets; they have names like Latin-2,
> Latin-5, Latin-15, KOI8-R, CP437, windows-1252, and so on.
> 
> Regards,
> Martin

Oh, I just meant that Latin-1 was an example of a one-byte character 
set, right? So UTF-8 would be identical to it if it worked how I used to 
think it did.



More information about the Python-list mailing list