How is unicode implemented behind the scenes?
Roy Smith
roy at panix.com
Sat Mar 8 22:01:59 EST 2014
In article <531bd709$0$29985$c3e8da3$5496439d at news.astraweb.com>,
Steven D'Aprano <steve+comp.lang.python at pearwood.info> wrote:
> There are various common ways to store Unicode strings in RAM.
>
> The first, UTF-16.
> [...]
> Another option is UTF-32.
> [...]
> Another option is to use UTF-8 internally.
> [...]
> In Python 3.3, CPython introduced an internal scheme that gives the best
> of all worlds. When a string is created, Python uses a different
> implementation depending on the characters in the string:
This was an excellent post, but I would take exception to the "best of
all worlds" statement. I would put it a little less absolutely and say
something like, "a good compromise for many common use cases". I would
even go with, "... for most common use cases". But, there are
situations where it loses.
More information about the Python-list
mailing list