grapheme cluster library

Rustom Mody rustompmody at gmail.com
Sat Oct 21 13:18:21 EDT 2017


On Saturday, October 21, 2017 at 9:22:24 PM UTC+5:30, MRAB wrote:
> On 2017-10-21 05:11, Rustom Mody wrote:
> > Is there a recommended library for manipulating grapheme clusters?
> > 
> > In particular, in devanagari
> > क् + ि = कि
> > in (pseudo)unicode names
> > KA-letter + I-sign = KI-composite-letter
> > 
> > I would like to be able to handle KI as a letter rather than two code-points.
> > Can of course write an automaton to group but guessing that its already
> > available some place…
> > 
> You can use the regex module to split a string into graphemes:
> 
> regex.findall(r'\X', string)

Thanks MRAB
Yes as I said I discovered r'\X'
Ultimately my code was (effectively) one line!

print("".join(map[x] for x in findall(r'\X', l)))

with map being a few 100 elements of a dictionary such as
map = {
 ...
 'ॐ': "OM",
...
}

$ cat purnam-deva 
ॐ पूर्णमदः पूर्णमिदं पूर्णात्पुर्णमुदच्यते
पूर्णस्य पूर्णमादाय पूर्णमेवावशिष्यते ॥

$ ./devanagari2roman.py purnam-deva 
OM pUraNamadaH pUraNamidaM pUraNAtpuraNamudachyate
pUraNasya pUraNamAdAya pUraNamavAvashiShyate ..
OM shAntiH shAntiH shAntiH ..

Basically, an inversion of the itrans input method
https://en.wikipedia.org/wiki/ITRANS



More information about the Python-list mailing list