[Python-Dev] Internal representation of strings and Micropython

Paul Sokolovsky pmiscml at gmail.com
Thu Jun 5 13:25:28 CEST 2014


Hello,

On Thu, 05 Jun 2014 16:54:11 +0900
"Stephen J. Turnbull" <stephen at xemacs.org> wrote:

> Paul Sokolovsky writes:
> 
>  > Please put that in perspective when alarming over O(1) indexing of
>  > inherently problematic niche datatype. (Again, it's not my or
>  > MicroPython's fault that it was forced as standard string type.
>  > Maybe if CPython seriously considered now-standard UTF-8 encoding,
>  > results of what is "str" type might be different. But CPython has
>  > gigabytes of heap to spare, and for MicroPython, every half-bit is
>  > precious).
> 
> Would you please stop trolling?  The reasons for adopting Unicode as a
> separate data type were good and sufficient in 2000, and they remain

If it was kept at "separate data type" bay, there wouldn't be any
problem. But it was made "one and only string type", and all strife
started then.

And there going to be "trolling" as long as Python developers and
decision-makers will ignore (troll?) outcry from the community (again, I
was surprised and not surprised to see ~50% of traffic on python-list
touches Unicode issues). 

Well, I understand the plan - hoping that people will "get over this".
And I'm personally happy to stay away from this "trolling", but any
discussion related to Unicode goes in circles and returns to feeling
that Unicode at the central role as put there by Python3 is misplaced.

Then for me, it's just a matter of job security and personal future - I
don't want to spend rest of my days as a javascript (or other idiotic
language) monkey. And the message is clear in the air
(http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/ and
elsewhere): if Python strings are now in Go, and in Python itself are
now Java strings, all causing strife, why not go cruising around and see
what's up, instead of staying strong, and growing bigger, community.

> so today, even if you have been fortunate enough not to burn yourself
> on character-byte conflation yet.
> 
> What matters to you is that str (unicode) is an opaque type -- there
> is no specification of the internal representation in the language
> reference, and in fact several different ones coexist happily across
> existing Python implementations -- and you're free to use a UTF-8
> implementation if that suits the applications you expect for
> MicroPython.
> 
> PEP 393 exists, of course, and specifies the current internal
> representation for CPython 3.  But I don't see anything in it that
> suggests it's mandated for any other implementation.

I knew all this before very well. What's strange is that other
developers don't know, or treat seriously, all of the above. That's why
gentleman who kindly was interested in adding Unicode support to
MicroPython started with the idea of dragging in CPython implementation.
And the only effect persuasion that it's not necessarily the best
solution had, was that he started to feel that he's being manipulated
into writing something ugly, instead of the bright idea he had.

That's why another gentleman reduces it to: "O(1) on string indexing or
not a Python!". 

And that's why another gentleman, who agrees to UTF-8 arguments, still
gives an excuse
(https://mail.python.org/pipermail/python-dev/2014-June/134727.html):
"In this context, while a fixed-width encoding may be the correct
choice it would also likely be the wrong choice."


In this regard, I'm glad to participate in mind-resetting discussion.
So, let's reiterate - there's nothing like "the best", "the only right",
"the only correct", "righter than", "more correct than" in CPython's
implementation of Unicode storage. It is *arbitrary*. Well, sure, it's
not arbitrary, but based on requirements, and these requirements match
CPython's (implied) usage model well enough. But among all possible
sets of requirements, CPython's requirements are no more valid that
other possible. And other set of requirement fairly clearly lead to
situation where CPython implementation is rejected as not correct for
those requirements at all.



-- 
Best regards,
 Paul                          mailto:pmiscml at gmail.com


More information about the Python-Dev mailing list