[issue30717] Add unicode grapheme cluster break algorithm

Manish report at bugs.python.org
Tue Jan 7 03:20:10 EST 2020


Manish <manishsmail at gmail.com> added the comment:

> Does `unicode-segmentation` support all platforms that CPython supports?

It's no-std, so it supports everything the base Rust compiler supports (which is basically everything llvm supports).

And yeah, if there's something that doesn't match with the support matrix this isn't going to work. 


However, I suggested this more for the potential PyPI package. If you're working this into CPython you'd have to figure out how best to include Rust stuff in your build system, which seems like a giant chunk of scope creep :)



For including in CPython I'd suggest looking through unicode-segmentation and writing a C version of it. We use a python script[1] to generate the data tables, this might be something y'all can use. Swift's UAX 29 implementation is also quite interesting, however it's baked in deeply to the language so it's less useful as a starting point.


 [1]: https://github.com/unicode-rs/unicode-segmentation/blob/master/scripts/unicode.py

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue30717>
_______________________________________


More information about the Python-bugs-list mailing list