[Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes)

M.-A. Lemburg mal at egenix.com
Tue Jul 9 09:16:43 CEST 2013


On 08.07.2013 20:52, Bruce Leban wrote:
> On Sun, Jul 7, 2013 at 3:29 AM, David Kendal <me at dpk.io> wrote:
> 
>> Python provides a way to iterate characters of a string by using the
>> string as an iterable. But there's no way to iterate over Unicode graphemes
>> (a cluster of characters consisting of a base character plus a number of
>> combining marks and other modifiers -- or what the human eye would consider
>> to be one "character").
>>
>> I think this ought to be provided either in the unicodedata library,
>> (unicodedata.itergraphemes(string)) which exposes the character database
>> information needed to make this work, or as a method on the built-in str
>> type. (str.itergraphemes() or str.graphemes())
> 
> 
> A common case is wanting to extract the current grapheme or move forward or
> backward one. Please consider these other use cases rather than just adding
> an iterator.
> 
> g = unicodedata.grapheme_cluster(str, i)  # extracts cluster that includes
> index i (i may be in the middle of the cluster)
> i = unicodedata.grapheme_start(str, i)  # if i is the start of the cluster,
> returns i; otherwise backs up to the start of the cluster
> i = unicodedata.previous_cluster(str, i)  # moves i to the first index of
> the previous cluster; returns None if no previous cluster in the string
> i = unicodedata.next_cluster(str, i)  # moves i to the first index of the
> next cluster; returns None if no next cluster in the String
> 
> 
> I think these belongs in unicodedata, not str.

FWIW: Here's a pre-PEP I once wrote for these things:

http://mail.python.org/pipermail/python-dev/2001-July/015938.html

At the time there was little interest, so I dropped the idea.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jul 09 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-07-16: Python Meeting Duesseldorf ...                  7 days to go

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Python-ideas mailing list