[Python-ideas] discontinue iterable strings

Nick Coghlan ncoghlan at gmail.com
Mon Aug 22 08:00:35 EDT 2016


On 22 August 2016 at 19:47, Stephen J. Turnbull
<turnbull.stephen.fw at u.tsukuba.ac.jp> wrote:
> Nick Coghlan writes:
>
>  > However, the real problem with this proposal (and the reason why the
>  > switch from 8-bit str to "bytes are effectively a tuple of ints" in
>  > Python 3 was such a pain), is that there are a lot of bytes and text
>  > processing operations that *really do* operate code point by code
>  > point.
>
> Sure, but code points aren't strings in any language I use except
> Python.  And AFAIK strings are the only case in Python where a
> singleton *is* an element, and an element *is* a singleton.

Sure, but the main concern at hand ("list(strobj)" giving a broken out
list of individual code points rather than TypeError) isn't actually
related to the fact those individual items are themselves length-1
strings, it's related to the fact that Python normally considers
strings to be a sequence type rather than a scalar value type.

str is far from the only builtin container type that NumPy gives the
scalar treatment when sticking it into an array:

>>> np.array("abc")
array('abc', dtype='<U3')
>>> np.array(b"abc")
array(b'abc', dtype='|S3')
>>> np.array({1, 2, 3})
array({1, 2, 3}, dtype=object)
>>> np.array({1:1, 2:2, 3:3})
array({1: 1, 2: 2, 3: 3}, dtype=object)

(Interestingly, both bytearray and memoryview get interpreted as
"uint8" arrays, unlike the bytes literal - presumably the latter
discrepancy is a requirement for compatibility with NumPy's
str/unicode handling in Python 2)

That's why I suggested that a scalar proxy based on wrapt.ObjectProxy
that masked all container related protocols could be an interesting
future addition to the standard library (especially if it has been
battle-tested on PyPI first). "I want to take this container instance,
and make it behave like it wasn't a container, even if other code
tries to use it as a container" is usually what people are after when
they find str iteration inconvenient, but "treat this container as a
scalar value, but otherwise expose all of its methods" is an operation
with applications beyond strings.

Not-so-coincidentally, that approach would also give us a de facto
"code point" type: it would be the result of applying the scalar proxy
to a length 1 str instance.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-ideas mailing list