[Python-Dev] Python3 "complexity"

Ben Finney ben+python at benfinney.id.au
Thu Jan 9 01:07:15 CET 2014


Kristján Valur Jónsson <kristjan at ccpgames.com> writes:

> Believe it or not, sometimes you really don't care about encodings.
> Sometimes you just want to parse text files.

Files don't contain text, they contain bytes. Bytes only become text
when filtered through the correct encoding.

Python should not guess the encoding if it's unknown. Without the right
encoding, you don't get text, you get partial or complete gibberish.

So, if what you want is to parse text and not get gibberish, you need to
*tell* Python what the encoding is. That's a brute fact of the world of
text in computing.

> Python 3 forces you to think about abstract concepts like encodings
> when all you want is to open that .txt file on the drive and extract
> some phone numbers and merge in some email addresses.  What encoding
> does the file have?  Do I care?  Must I care?

Yes, you must.

> Python forcing you to think about this is like the cashier at the
> hardware store who won't let you buy the hammer you brought to the
> cash register because you don't know what wood its handle is made of.

The cashier is making a mistake: the hammer, regardless of the wood in
the handle, still functions just fine as a hammer. Hence, the question
is unimportant to the purpose.

The same is not true of changing the encoding for text. The encoding
matters, and the programmer needs to care.

-- 
 \         “How wonderful that we have met with a paradox. Now we have |
  `\                        some hope of making progress.” —Niels Bohr |
_o__)                                                                  |
Ben Finney



More information about the Python-Dev mailing list