String encoding in Py2.7

Chris Angelico rosuav at gmail.com
Tue May 29 06:48:03 EDT 2018


On Tue, May 29, 2018 at 8:39 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> On Tue, 29 May 2018 09:19:52 +0000, Fabien LUCE wrote:
>
>> May 29 2018 11:12 AM, "Thomas Jollans" <tjol at tjol.eu> wrote:
>>> On 2018-05-29 09:55, ftg at lutix.org wrote:
>>>
>>>> Hello,
>>>> Using Python 2.7 (will switch to Py3 soon but Before I'd like to
>>>> understand how string encoding worked)
>>>
>>> Oh dear. This is probably the exact wrong way to go about it: the
>>> interplay between string encoding, unicode and bytes is much less clear
>>> and easy to understand in Python 2.
>>
>> Ok I will quickly jump into py3 then.
>
> Why I applaud this decision -- the latest Python 3.x series is much
> better than 2.7 -- please don't imagine that moving to Python 3 will
> eliminate all encoding issues, especially when dealing with real-world
> data that comes to you in a mix of weird and often broken encodings.
>
> Python 3 eliminates one common source of problems: unlike Python 2, it
> won't try to guess what you mean when you combines bytes and Unicode
> text. In Python 2, that worked for the simple cases, and was often
> convenient, but at the cost of leading to hard to diagnose and hard to
> fix errors in the complex cases. Python 3 no longer guesses, which means
> you have to be more diligent in converting bytes to text and vice versa.

Python 3 eliminates a number of common sources of problems; in fact,
it eliminates a large number of problems. But you're right that it's
no panacea, since there cannot ever be a perfect solution.

> Also, it has to be said that Python 3 makes one use-case harder: mixed
> binary bytes plus ASCII text. (Or so I've been told.)

Early versions of Py3 yes, but the latest versions have had features
added that restore this to its Py2 simplicity (for ASCII
specifically).

ChrisA



More information about the Python-list mailing list