[Tutor] Absolute newbie - Transliteration

Bob Gailer bgailer@alum.rpi.edu
Wed May 21 11:46:01 2003


At 11:51 PM 5/20/2003 -0700, David Rogers wrote:

>Hi
>
>I'm an absolute newbie - this is my first attempt with Python or any 
>"real" language, so my advance apologies for any stupid comments.  I 
>joined the list just to ask this question, after doing a little searching 
>in the list archives and the documentation and not being able to find out 
>what I want to know.
>
>I'm trying make scripts to transliterate a file from (Unicode) Cyrillic 
>characters to each of
>- Roman script, and
>- International Phonetic Alphabet (more Unicode).
>
>(Whether I end up with separate scripts, one for each transliteration, or 
>one script for all with a bigger dictionary/list/table, is not important 
>to me.)
>
>The transliteration will not always be one-to-one in terms of the number 
>of characters, for example the "ch" sound is one letter in Russian but 
>corresponds to two letters in English.
>
>I have found the following in the Python web documentation...
>
>>translate(table[, deletechars])
>>
>>
>>Return a copy of the string where all characters occurring in the 
>>optional argument deletechars are removed, and the remaining characters 
>>have been mapped through the given translation table, which must be a 
>>string of length 256.
>
>
>...but I don't understand what format my table needs to be in, or even if 
>this accommodates Unicode, or the problem of one character sometimes 
>translating to two.  If I'm completely on the wrong track here, somebody 
>laugh now before it's too late.   :-)
>
>
>What I don't want is a pointer to a non-modifiable Cyrillic-to-Roman 
>transliteration application, because I want to re-use what I do here when 
>I make other transliteration tables to speed up IPA transcription from 
>other languages too.  I love IPA.    :-)
>
>On the other hand, if somebody has already done something like what I 
>want, in a script I can modify for other uses, then I'm all ears.
>(Some of me is ears all the time.)  I'm happy to make the lists, 
>dictionary entries, or whatever format they need to be in - I just want to 
>know how to get Python to read this stuff and then give me back the right 
>thing.
>
>I'm using Mac OS X, if it makes any difference.

translate() is for ASCII not Unicode. My best guess is a dictionary.

Bob Gailer
bgailer@alum.rpi.edu
303 442 2625