String encoding in Py2.7

Steven D'Aprano steve+comp.lang.python at pearwood.info
Tue May 29 06:39:54 EDT 2018


On Tue, 29 May 2018 09:19:52 +0000, Fabien LUCE wrote:

> May 29 2018 11:12 AM, "Thomas Jollans" <tjol at tjol.eu> wrote:
>> On 2018-05-29 09:55, ftg at lutix.org wrote:
>> 
>>> Hello,
>>> Using Python 2.7 (will switch to Py3 soon but Before I'd like to
>>> understand how string encoding worked)
>> 
>> Oh dear. This is probably the exact wrong way to go about it: the
>> interplay between string encoding, unicode and bytes is much less clear
>> and easy to understand in Python 2.
> 
> Ok I will quickly jump into py3 then.

Why I applaud this decision -- the latest Python 3.x series is much 
better than 2.7 -- please don't imagine that moving to Python 3 will 
eliminate all encoding issues, especially when dealing with real-world 
data that comes to you in a mix of weird and often broken encodings.

Python 3 eliminates one common source of problems: unlike Python 2, it 
won't try to guess what you mean when you combines bytes and Unicode 
text. In Python 2, that worked for the simple cases, and was often 
convenient, but at the cost of leading to hard to diagnose and hard to 
fix errors in the complex cases. Python 3 no longer guesses, which means 
you have to be more diligent in converting bytes to text and vice versa.

Also, it has to be said that Python 3 makes one use-case harder: mixed 
binary bytes plus ASCII text. (Or so I've been told.)

But for the common case where you have human readable text in Unicode, 
and machine readable bytes in hex bytes, and can keep them separate, 
Python 3 is much better.

I recommend you start with reading these if you haven't already:

https://nedbatchelder.com/text/unipain.html

https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-
software-developer-absolutely-positively-must-know-about-unicode-and-
character-sets-no-excuses/

Sorry for the huge URL, try this if your mail client breaks it: 
https://tinyurl.com/h8yg9d7




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson




More information about the Python-list mailing list