[Python-ideas] Adding str.isascii() ?

INADA Naoki songofacandy at gmail.com
Fri Jan 26 08:11:31 EST 2018


Do you mean we should fix *all* of CPython unicode handling,
not only str.isascii()?

At least, equality test doesn't care wrong kind.

https://github.com/python/cpython/blob/master/Objects/stringlib/eq.h
https://github.com/python/cpython/blob/e76daebc0c8afa3981a4c5a8b54537f756e805de/Objects/unicodeobject.c#L10871-L10873
https://github.com/python/cpython/blob/e76daebc0c8afa3981a4c5a8b54537f756e805de/Objects/unicodeobject.c#L10998-L10999

There may be many others, but I'm not sure.


On Fri, Jan 26, 2018 at 10:02 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 26.01.2018 12:17, INADA Naoki wrote:
>>> No, because you can pass in maxchar to PyUnicode_New() and
>>> the implementation will take this as hint to the max code point
>>> used in the string. There is no check done whether maxchar
>>> is indeed the minimum upper bound to the code point ordinals.
>>
>> API doc says:
>>
>> """
>> maxchar should be the true maximum code point to be placed in the string.
>> As an approximation, it can be rounded up to the nearest value in the
>> sequence 127, 255, 65535, 1114111.
>> """
>> https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_New
>>
>> Since doc says *should*, strings created with wrong maxchar
>> are considered invalid object.
>
> Not really: "should" means should, not must :-) Objects created
> with PyUnicode_New() are valid and ready (this only has a meaning
> for legacy strings).
>
> You can set maxchar to 64k and still just use ASCII as content.
> In some cases, you may want the internal string representation
> to be wchar_t compatible or work with Py_UCS2/4, so both 64k
> and sys.maxunicode are reasonable and valid values.
>
> Overall, I'm starting to believe that a str.maxchar() function
> would be a better choice than to only go for ASCII.
>
> This could have an optional parameter "exact" to force scanning
> the string and returning the actual max code point ordinal
> when set to True (default), or return the approximation based
> on the used kind if not set (which is many cases, will give
> you a good hint).
>
> For checking ASCII, you'd then write:
>
> def isascii(s):
>     if s.maxchar(exact=False) < 128:
>         return True
>     if s.maxchar() < 128:
>         return True
>     return False
>
> --
> Marc-Andre Lemburg
> eGenix.com
>
> Professional Python Services directly from the Experts (#1, Jan 26 2018)
>>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>>> Python Database Interfaces ...           http://products.egenix.com/
>>>> Plone/Zope Database Interfaces ...           http://zope.egenix.com/
> ________________________________________________________________________
>
> ::: We implement business ideas - efficiently in both time and costs :::
>
>    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>            Registered at Amtsgericht Duesseldorf: HRB 46611
>                http://www.egenix.com/company/contact/
>                       http://www.malemburg.com/
>



-- 
INADA Naoki  <songofacandy at gmail.com>


More information about the Python-ideas mailing list