[Python-Dev] Internal representation of strings and Micropython

Fri Jun 6 13:15:35 CEST 2014

Hello,

On Thu, 5 Jun 2014 23:15:54 +1000
Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 5 June 2014 22:37, Paul Sokolovsky <pmiscml at gmail.com> wrote:
> > On Thu, 5 Jun 2014 22:20:04 +1000
> > Nick Coghlan <ncoghlan at gmail.com> wrote:
> >> problems caused by trusting the locale encoding to be correct, but
> >> the startup code will need non-trivial changes for that to happen
> >> - the C.UTF-8 locale may even become widespread before we get
> >> there).
> >
> > ... And until those golden times come, it would be nice if Python
> > did not force its perfect world model, which unfortunately is not
> > based on surrounding reality, and let users solve their encoding
> > problems themselves - when they need, because again, one can go
> > quite a long way without dealing with encodings at all. Whereas now
> > Python3 forces users to deal with encoding almost universally, but
> > forcing a particular for all strings (which is again, doesn't
> > correspond to the state of surrounding reality). I already hear
> > response that it's good that users taught to deal with encoding,
> > that will make them write correct programs, but that's a bit far
> > away from the original aim of making it write "correct" programs
> > easy and pleasant. (And definition of "correct" vary.)
> 
> As I've said before in other contexts, find me Windows, Mac OS X and
> JVM developers, or educators and scientists that are as concerned by
> the text model changes as folks that are primarily focused on Linux
> system (including network) programming, and I'll be more willing to
> concede the point.

Well, but this question reduces to finding out (or specifying) who are
target audiences of Python. It always has been (with a bow to Guido)
forpost of scientific users (and probably even if there was mass exodus
of other categories of users will remain prominent in that role). But
Python has always had its share as system scripting language among
Perl-haters, and with Perl going flatline, I guess it's fair to say
that Python is major system scripting and service implementation
language.

To whom all features like memoryview, array.array, in-place
input operations, etc. cater? To scientists? I'm sure most of them are
just happy with stuffing "@jit" for their kernel functions. And
scientist who bother with memoryviews for their data structures are
system-level-ish programmers too.

So, no wonder that Linux crowd cries at Python3 - it makes doing simple
things unnecessarily complicated.

> Windows, Mac OS X, and the JVM are all opinionated about the text
> encodings to be used at platform boundaries (using UTF-16, UTF-8 and
> UTF-16, respectively). By contrast, Linux (or, more accurately, POSIX)
> says "well, it's configurable, but we won't provide a reliable
> mechanism for finding out what the encoding is. So either guess as

[]

Yes, I understand complexity of developing cross-platform language with
advanced features. By I may offer another look at all this activity:
Python3 was brave enough to do revolution in its own world (catching a
lot of its users by surprise), but surely not brave enough to do
revolution around itself, by saying something like "We choose ONE, the
most right, and even the most used (per bytes transferred) encoding as
our standard I/O encoding. Grow up or explicitly specify encoding which
you personally need.".

Surely, it didn't to that - it makes no sense to fight the world. But
then Python3 is sympathetic about Java's desire to use "UTF-16" instead
of "right" encoding, and no so about Unix desire to treat encodings
as a separate level from content (and treating Unicode by nothing else
as yet another arbitrary encoding, which it is formally, and will be
for a long time de-facto, however sad it is). So, maybe "cross-platform"
should have mean "don't do implicit conversions". Because see, Python2
had a problem with implicit encoding conversion when str and unicode
objects were mixed, and Python3 has problem with implicit conversions
whenever str is used at all.

Anyway, I appreciate detailed responses, and understand what you
(Python3 developers) are trying to achieve, and appreciate your work,
and hope it all work out. Each user has own concerns about Unicode.
Mine are efficiency and layering. But once MicroPython has UTF-8 support
I will be much more relaxed about it. Layering is harder to accept, but
hopefully can be tackled too both on own mind's and technical sides. I
hope other users will find their peace with Unicode too!

[]

-- 
Best regards,
 Paul                          mailto:pmiscml at gmail.com