transliteration in Python
Martin von Loewis
loewis at informatik.hu-berlin.de
Fri Jan 4 12:14:36 EST 2002
Giorgi Lekishvili <gleki at gol.ge> writes:
> Can someone smarter than me explain the common syntax of transliteration
> of one encoding to another?
There is no syntax for doing transliteration. You have to write a
transliteration algorithm.
> Suppose we have string sl="shchi" and want to recode it in "KOIR-8". How
> can this be achieved?
First, write a codec for, say, an encoding 'russian-translit'. To do
so, the algorithm may contain a fragment that reads
if input.startswith("shch"):
input = input[4:]
output += u"\u0429"
# alternatively, write
# output += u'\N{CYRILLIC SMALL LETTER SHCHA}'
...
if inpuy.startswith("i"):
input = input[1:]
output += u'\N{CYRILLIC SMALL LETTER I}'
(assuming that shchi is transliterated for SHCHA, I)
With this codec, you can now write
uni = unicode("shchi", "russian-translit")
koi = uni.encode("koi8-r") # the KOI8 codec is already supported
HTH,
Martin
More information about the Python-list
mailing list