Pound sign problem

Chris Angelico rosuav at gmail.com
Tue Apr 11 13:26:24 EDT 2017


On Wed, Apr 12, 2017 at 2:23 AM, Lew Pitcher
<lew.pitcher at digitalfreehold.ca> wrote:
> Chris Angelico wrote:
>
>> On Wed, Apr 12, 2017 at 1:24 AM, Lew Pitcher
>> <lew.pitcher at digitalfreehold.ca> wrote:
>>>
>>> What in "Try changing your target encoding to something other than ASCII"
>>> is encouragement to use "old legacy encodings"?
>>>
>>>> In 2017, unless you are reading from old legacy files created using a
>>>> non-Unicode encoding, you should just use UTF-8.
>>>
>>> Thanks for your opinion. My opinion differs.
>>
>> So what encoding *do* you recommend, and why is it better than UTF-8?
>
> I recommend whatever encoding is appropriate for the output. That's not up
> to you or me to decide; that's a question that only the OP can answer.
>
> (Imagine, python on an IBM Zseries running ZOS; the "native" characterset is
> one of the EBCDIC variants. Would UTF-8 be a better choice there? )

So if the OP needed to print out a number, would you take a similarly
spineless approach and say that only the OP can decide what numeric
base to use? Does every fledgeling programmer need to understand about
archaic systems where you needed to use BCD for your numbers? EBCDIC
derives from BCD, where a single decimal digit was encoded in four
bits... and I'm sure you could name systems even less popular, used on
important systems back in the 1960s or so. Does a modern Python
programmer need to look through all of those possible ways to
represent numbers? NO. Today's programmer should need to know about
very few ways to represent numbers, in priority order:

1) Decimal digits represented in ASCII
2) Packed binary, network byte order
3) Packed binary, little-endian.

A new programmer shouldn't need to worry about anything other than
decimal digits, in fact. Of course other systems do exist, like the
MIDI "variable length integer" that packs seven bits into a byte and
then uses the high bit as a continuation marker; or IEEE 80-bit
floating point, or a multi-limb format like GMP uses, but until you
actually need to work with it, you don't need to know about it.

Just use the one most obvious encoding. UTF-8 for all text.

ChrisA



More information about the Python-list mailing list