Unicode

Matt Ruffalo mruffalo at cs.cmu.edu
Sun Sep 17 19:18:34 EDT 2017


On 2017-09-17 17:27, leam hall wrote:
>
> Ah! So this works in Py2:
>    def __str__(self):
>      name    = self.name.encode("utf-8")
>
>
> It completely fails in Py3:
>   PVT b'Lakeisha F\xc3\xa1bi\xc3\xa1n' 7966A4     [F] Age: 22
>
>
> Note that moving __str__() to display() gets the same results. Not sure it
> is an issue with __str__.
>
>
>
>> The more you think about it the more attractive a switch to Python 3 will
>> appear.
>>
> Not for me, actually. I'm trying to learn better OOP and coding in general.
> I'm using Python because there's a work related use case for Py2. There
> isn't one for Py3. If the work use case it removed then there are lots of
> languages to try out.

Hi Leam-

Targeting Python 2.6 for deployment on RHEL/CentOS 6 is a perfectly
valid use case, and after the recent discussions in multiple threads
(your "Design: method in class or general function?" and INADA Naoki's
"People choosing Python 3"), I doubt it would be very useful to
reiterate the same points.

I can't speak for Peter Otten, but I suspect he was making a very narrow
statement about one of the large backwards-incompatible changes in
Python 3: strict separation between text (str) and binary data (bytes).
This stricter distinction eliminates the conceptual problems you
described, in terms of ensuring that you need to use the right type at
the right time in the right place, and would probably have prevented
your problem entirely.

Additionally, your note of "this works in Python 2 but fails in Python
3" shows some text-related confusion that is quite common when dealing
with the text model in Python 2. It is always the case that the
`__str__` method should return a `str` object under whichever version of
Python you're using, and your attempt of `self.name.encode("utf-8")`
returns the wrong type under Python 3. *Encoding* Unicode text (class
`unicode` under Python 2, `str` under 3) produces binary data (class
`str` under Python 2, `bytes` under 3). As such, you're returning a
`bytes` object from `__str__` in Python 3, which is incorrect. It would
be appropriate to do something like

"""
def __str__(self):
    if sys.version_info[0] < 3:
        return self.name.encode("utf-8")
    return self.name
"""

Django provides a `python_2_unicode_compatible` decorator that allows
always returning text (class `unicode` under Python 2, `str` under 3)
from `__str__`, and automatically rewrites a class' methods under Python
2. That decorator renames `__str__` to `__unicode__`, and creates a new
`__str__` method that essentially returns
`self.__unicode__().encode('utf-8')`.

(Hopefully this is clear enough, but I intended this message to be
practical advice for your current task and mental model of what's going
on, *not* as Python 3 evangelism.)

MMR...

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 829 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-list/attachments/20170917/1bf6e526/attachment.sig>


More information about the Python-list mailing list