[Python-ideas] Adding str.isascii() ?

Fri Jan 26 08:37:14 EST 2018

2018-01-26 13:39 GMT+01:00 Steven D'Aprano <steve at pearwood.info>:
> I have no objection to isascii, but I don't think it goes far enough.
> Sometimes I want to know whether a string is compatible with Latin-1 or
> UCS-2 as well as ASCII. For that, I used a function that exposes the
> size of code points in bits:

Really? I never required such check in practice. Would you mind to
elaborate your use case?

ASCII is very very common and hardcoded in many file formats and
protocols. Other character sets are more rare.

> @property
> def size(self):
>     # This can be implemented much more efficiently in CPython.
>     c = ord(max(self)) if self else 0
>     if c <= 0x7F:
>         return 7
>     elif c <= 0xFF:
>         return 8
>     elif c <= 0xFFFF:
>         return 16
>     else:
>         assert c <= 0x10FFFF
>         return 21

An efficient, O(1) complexity, implementation can be annoying to
implement. I don't think that it's worth it. Python doesn't have this
method, and I never see any user requesting this feature.

IMHO this size() idea comes from the PEP 393 design, but not from a
real use case.

In CPython, str.isascii() would be a O(1) operation since the result
is "cached" by design in the implementation of PyUnicode.

PEP 393 is an implementation detail. PyPy is now using utf8
internally, not PEP 393 (UCS1, UCS2 or UCS4). PyPy might want to use a
bit to cache if the string is ASCII or not, but I'm not sure that it's
worth it to check the maximum character or the size() result.

Victor