Micro Python -- a lean and efficient implementation of Python 3

Tue Jun 3 17:41:12 EDT 2014

Hello,

On Wed, 4 Jun 2014 03:08:57 +1000
Chris Angelico <rosuav at gmail.com> wrote:

[]

> With that encouragement, I just cloned your repo and built it on amd64
> Debian Wheezy. Works just fine! Except... I've just found one fairly
> major problem with your support of Python 3.x syntax. Your str type is
> documented as not supporting Unicode. Is that a current flaw that
> you're planning to remove, or a design limitation? Either way, I'm a
> bit dubious about a purported version 1 that doesn't do one of the
> things that Py3 is especially good at - matched by very few languages
> in its encouragement of best practice with Unicode support.

I should start with saying that it's MicroPython what made me look at
Python3. So for me, it already did lot of boon by getting me from under
the rock, so now instead of "at my job, we use python 2.x" I may report
"at my job, we don't wait when our distro will kick us in the ass, and
add 'from __future__ import print_function' whenever we touch some
code".

With that in mind, I, as many others, think that forcing Unicode bloat
upon people by default is the most controversial feature of Python3.
The reason is that you go very long way dealing with languages of the
people of the world by just treating strings as consisting of 8-bit
data. I'd say, that's enough for 90% of applications. Unicode is needed
only if one needs to deal with multiple languages *at the same time*,
which is fairly rare (remaining 10% of apps).

And please keep in mind that MicroPython was originally intended (and
should be remain scalable down to) an MCU. Unicode needed there is even
less, and even less resources to support Unicode just because.

> 
> What is your str type actually able to support? It seems to store
> non-ASCII bytes in it, which I presume are supposed to represent the
> rest of Latin-1, but I wasn't able to print them out:

There's a work-in-progress on documenting differences between CPython
and MicroPython at
https://github.com/micropython/micropython/wiki/Differences, it gives
following account on this:

"No unicode support is actually implemented. Python3 calls for strict
difference between str and bytes data types (unlike Python2, which has
neutral unified data type for strings and binary data, and separates
out unicode data type). MicroPython faithfully implements str/bytes
separation, but currently, underlying str implementation is the same as
bytes. This means strings in MicroPython are not unicode, but 8-bit
characters (fully binary-clean)."

> 
> Micro Python v1.0.1-144-gb294a7e on 2014-06-04; UNIX version
> >>> print("asdf\xfdqwer")
> 
> Python 3.5.0a0 (default:6a0def54c63d, Mar 26 2014, 01:11:09)
> [GCC 4.7.2] on linux
> >>> print("asdf\xfdqwer")
> asdfýqwer
> 
> In fact, printing seems to work with bytes:
> 
> >>> print("asdf\xc3\xbdqwer")
> asdfýqwer
> 
> (my terminal uses UTF-8, this is the UTF-8 encoding of the above
> string)
> 
> I would strongly recommend either implementing all of PEP 393, or at
> least making it very clear that this pretends everything is bytes -
> and possibly disallowing any codepoint >127 in any string, which will
> at least mean you're safe on all ASCII-compatible encodings.

MicroPython is not the first "tiny" Python implementation. What differs
MicroPython is that it's neither aim or motto to be a subset of
language. And yet, it's not CPython rewrite either. So, while Unicode
support is surely possible, it's unlikely to be done as "all of
PEPxxx". If you ask me, I'd personally envision it to be implemented as
UTF-8 (in this regard I agree with (or take an influence from) 
http://lucumr.pocoo.org/2014/1/9/ucs-vs-utf8/). But I don't have plans
to work on Unicode any time soon - applications I envision for
MicroPython so far fit in those 90% that live happily without Unicode.

But generally, there's no strict roadmap for MicroPython features.
While core of the language (parser, compiler, VM) is developed by
Damien, many other features were already contributed by the community
(project went open-source at the beginning of the year). So, if someone
will want to see Unicode support up to the level of providing patches,
it gladly will be accepted. The only thing we established is that we
want to be able to scale down, and thus almost all features should be
configurable.

> 
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
Best regards,
 Paul                          mailto:pmiscml at gmail.com