How is unicode implemented behind the scenes?

Roy Smith roy at panix.com
Sat Mar 8 22:01:59 EST 2014


In article <531bd709$0$29985$c3e8da3$5496439d at news.astraweb.com>,
 Steven D'Aprano <steve+comp.lang.python at pearwood.info> wrote:

> There are various common ways to store Unicode strings in RAM.
> 
> The first, UTF-16.
> [...]
> Another option is UTF-32.
> [...]
> Another option is to use UTF-8 internally.
> [...]
> In Python 3.3, CPython introduced an internal scheme that gives the best 
> of all worlds. When a string is created, Python uses a different 
> implementation depending on the characters in the string:

This was an excellent post, but I would take exception to the "best of 
all worlds" statement.  I would put it a little less absolutely and say 
something like, "a good compromise for many common use cases".  I would 
even go with, "... for most common use cases".  But, there are 
situations where it loses.



More information about the Python-list mailing list