[issue45692] IDLE: define word/id chars in one place.

Terry J. Reedy report at bugs.python.org
Tue Nov 2 15:20:07 EDT 2021


Terry J. Reedy <tjreedy at udel.edu> added the comment:

There have been occasional discussions about IDLE not being properly unicode aware in some of its functions.  Discussions have foundered on these facts and no fix made.  

1. The direct replacement string, your 'identcontchars', seems too big. We have always assumed that O(n) linear scans would be too slow.
2. A frozen set should give O(1) lookup, like fast enough, but would be even bigger.
3. The string methods operate on and scan through multiple chars, whereas IDLE wants to test 1 char at a time.
4. Even if the O(n*n) behavior of multiple calls is acceptible, there is no function for unicode continuation chars.  s.idchars requires that the first character be a start char, which is to say, not a digit.  s.alnum is false for '_'.  (Otherwise, it would work.)

I would like to better this time.  Possible responses to the blockers:

1. Correct; reject.

2. Maybe adding an elephant is better than keeping multiple IDLE features disabled for non-ascii users.  How big?

>>> import sys
>>> fz = frozenset(c for c in map(chr, range(0x110000)) if ('a'+c).isidentifier)
>>> sys.getsizeof(fz)
33554648

Whoops, each 2 or 4 byte slice of the underlying array becomes 76 bytes + 8 bytes * size of hash array.  Not practical either.

3. For at least some of the uses, the repeated calls may be fast enough.

4. We can synthesize s.isidcontinue with "c.isalnum() or c == '_'".   "c.isidentifier() or c.isdigit()" would also work but should be slower.

Any other ideas?  I will look at the use cases next.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue45692>
_______________________________________


More information about the Python-bugs-list mailing list