transliteration in Python

Martin von Loewis loewis at informatik.hu-berlin.de
Fri Jan 4 12:14:36 EST 2002


Giorgi Lekishvili <gleki at gol.ge> writes:

> Can someone smarter than me explain the common syntax of transliteration
> of one encoding to another?

There is no syntax for doing transliteration. You have to write a
transliteration algorithm.

> Suppose we have string sl="shchi" and want to recode it in "KOIR-8". How
> can this be achieved?

First, write a codec for, say, an encoding 'russian-translit'. To do
so, the algorithm may contain a fragment that reads

  if input.startswith("shch"):
    input = input[4:]
    output += u"\u0429"
    # alternatively, write
    # output += u'\N{CYRILLIC SMALL LETTER SHCHA}'
  ...
  if inpuy.startswith("i"):
    input = input[1:]
    output += u'\N{CYRILLIC SMALL LETTER I}'

(assuming that shchi is transliterated for SHCHA, I)

With this codec, you can now write

  uni = unicode("shchi", "russian-translit")
  koi = uni.encode("koi8-r") # the KOI8 codec is already supported

HTH,
Martin




More information about the Python-list mailing list