[Python-Dev] Internal representation of strings and Micropython

Guido van Rossum guido at python.org
Wed Jun 4 07:23:07 CEST 2014


On Tue, Jun 3, 2014 at 7:32 PM, Chris Angelico <rosuav at gmail.com> wrote:

> On Wed, Jun 4, 2014 at 11:17 AM, Steven D'Aprano <steve at pearwood.info>
> wrote:
> > * Having a build-time option to restrict all strings to ASCII-only.
> >
> >   (I think what they mean by that is that strings will be like Python 2
> >   strings, ASCII-plus-arbitrary-bytes, not actually ASCII.)
>
> What I was actually suggesting along those lines was that the str type
> still be notionally a Unicode string, but that any codepoints >127
> would either raise an exception or blow an assertion, and all the code
> to handle multibyte representations would be compiled out.


That would be a pretty lousy option.

So there'd
> still be a difference between strings of text and streams of bytes,
> but all encoding and decoding to/from ASCII-compatible encodings would
> just point to the same bytes in RAM.
>

I suppose this is why you propose to reject 128-255?


> Risk: Someone would implement that with assertions, then compile with
> assertions disabled, test only with ASCII, and have lurking bugs.
>

Never mind disabling assertions -- even with enabled assertions you'd have
to expect most Python programs to fail with non-ASCII input.

Then again the UTF-8 option would be pretty devastating too for anything
manipulating strings (especially since many Python APIs are defined using
indexes, e.g. the re module).

Why not support variable-width strings like CPython 3.4?

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140603/edbde954/attachment.html>


More information about the Python-Dev mailing list