[Python-Dev] Python3 "complexity"

anatoly techtonik techtonik at gmail.com
Fri Jan 10 01:53:20 CET 2014


On Thu, Jan 9, 2014 at 10:00 AM, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:
> On 09/01/2014 06:50, Lennart Regebro wrote:
>>
>> On Thu, Jan 9, 2014 at 1:07 AM, Ben Finney <ben+python at benfinney.id.au>
>> wrote:
>>>
>>> Kristján Valur Jónsson <kristjan at ccpgames.com> writes:
>>>
>>>> Believe it or not, sometimes you really don't care about encodings.
>>>> Sometimes you just want to parse text files.
>>>
>>>
>>> Files don't contain text, they contain bytes. Bytes only become text
>>> when filtered through the correct encoding.
>>
>>
>> To be honest, you can define text as "A stream of bytes that are split
>> up in lines separated by a linefeed", and do some basic text
>> processing like that. Just very *basic*, but still. Replacing
>> characters. Extracting certain lines etc.
>>
>> This is harder in Python 3, as bytes does not have all the
>> functionality strings has, like formatting. This can probably be fixed
>> in Python 3.5, if the relevant PEP gets finished.
>>
>> For the battery analogy, that's like saying:
>>
>> "I want a battery."
>>
>> "What kind?"
>>
>> "It doesn't matter, as long as it's over 5V."
>>
>> //Lennart
>>
>
> "That Python 3 battery you sold me blew up when I tried using it".
>
> "We've been telling you for years that could happen".
>
> "I didn't think you actually meant it".

      "These new nuclear cells are awesome! But you stop from from
leaking on their users?"

A1: "The nuclear power is radioactive. Accept it."

A2: "This is the basic stdlib container. You're supposed to protect yourself."

A3: "The world is changing. Everybody should learn nuclear fission to
use things properly."

      "..."

and while we are at it, if the battery became more advanced, there is
no reason to
strip off simple default interface. This interface is not an abstract
discussion here,
but a real user experience study (I am going to spread UX virus),
which starts with:

  1. expectations
  2. experience
  3. outcomes

and progressively iterate over 2 to get 3 matching 1 as close as
possibly, without
trying to change 1. 1 is equal to changing people - it is simple and
natural solution
that people practicing every day on children and subordinates. The
only problem is
that it is ineffective, hard and useless activity in open source
environment, because
most people by the nature of their neural network processes become conservative
with ages. That's why people invented forks. However, for the encoding problem,
there are some good default solutions. You'll have choose between different
interests anyway, but here it is:

  1. always open() text files in UTF-8 by default
  2. introduce autodetect mode to open functions
     1. read and transform on the fly, maintaining a buffer that
stores original bytes
         and their mapping to letters. The mapping is updated as bytes frequency
         changes. When the buffer is full, you have the best candidate.
  3. provide sane error messages
     1. messages that users do actually understand
     2. messages that tell how to fix the problem

If interface becomes more complicated - the last thing you should do is to leave
user 1:1 with interface problems.

And to conclude, I am not saying that people should not learn about unicode,
but the learning curve should not be as steep as Python 3 demands it.


More information about the Python-Dev mailing list