[issue30717] Add unicode grapheme cluster break algorithm

Serhiy Storchaka report at bugs.python.org
Thu Aug 3 07:21:37 EDT 2017


Serhiy Storchaka added the comment:

Issue18406 is closed as a duplicate of this issue. There are useful links in issue18406. In particular see a proto-PEP of Unicode Indexing Helper Module:

http://mail.python.org/pipermail/python-dev/2001-July/015938.html

I agreed that providing grapheme iterator would be useful. But it would be useful to provide also word and sentence iterators.

Should iterators provide just substrings or their positions? I think emitting a pair (pos, substring) would be more useful. It is easier to create an iterator of substrings from the iterator of pairs than opposite.

Alternatively an iterator could emit slice objects. Or special objects similar to re match objects.

----------
nosy: +mrabarnett

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue30717>
_______________________________________


More information about the Python-bugs-list mailing list