Blog "about python 3"

Chris Angelico rosuav at gmail.com
Sat Jan 4 21:54:29 EST 2014


On Sun, Jan 5, 2014 at 1:41 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> wxjmfauth at gmail.com wrote:
>
>> The very interesting aspect in the way you are holding
>> unicodes (strings). By comparing Python 2 with Python 3.3,
>> you are comparing utf-8 with the the internal "representation"
>> of Python 3.3 (the flexible string represenation).
>
> This is incorrect. Python 2 has never used UTF-8 internally for Unicode
> strings. In narrow builds, it uses UTF-16, but makes no allowance for
> surrogate pairs in strings. In wide builds, it uses UTF-32.

That's for Python's unicode type. What Robin said was that they were
using either a byte string ("str") with UTF-8 data, or a Unicode
string ("unicode") with character data. So jmf was right, except that
it's not specifically to do with Py2 vs Py3.3.

ChrisA



More information about the Python-list mailing list