[Python-ideas] isascii()/islatin1()/isbmp()
Steven D'Aprano
steve at pearwood.info
Sun Jul 1 02:59:23 CEST 2012
Serhiy Storchaka wrote:
> As shown in issue #15016 [1], there is a use cases when it is useful to
> determine that string can be encoded in ASCII or Latin1. In working with
> Tk or Windows console applications can be useful to determine that
> string can be encoded in UCS2. C API provides interface for this, but at
> Python level it is not available.
>
> I propose to add to strings class new methods: isascii(), islatin1() and
> isbmp() (in addition to such methods as isalpha() or isdigit()). The
> implementation will be trivial.
>
> Pro: The current trick with trying to encode has O(n) complexity and has
> overhead of exception raising/catching.
Are you suggesting that isascii and friends would be *better* than O(n)? How
can that work -- wouldn't it have to scan the string and look at each character?
Why just ASCII, Latin1 and BMP (whatever that is, googling has not come up
with anything relevant)? It seems to me that adding these three tests will
open the doors to a steady stream of requests for new methods
is<insert encoding name here>.
I suggest that a better API would be a method that takes the name of an
encoding (perhaps defaulting to 'ascii') and returns True|False:
string.encodable(encoding='ascii') -> True|False
Return True if string can be encoded using the named encoding, otherwise False.
One last pedantic issue: strings aren't ASCII or Latin1, etc., but Unicode.
There is enough confusion between Unicode text strings and bytes without
adding methods whose names blur the distinction slightly.
--
Steven
More information about the Python-ideas
mailing list