transliteration in Python

Jason Orendorff jason at jorendorff.com
Fri Jan 4 04:58:00 EST 2002


> Can someone smarter than me explain the common syntax of transliteration
> of one encoding to another?

No.  But if you can settle for a dumber-than-a-box-of-hammers person,
read on...

  s1 = "shchi"            # start with some ascii bytes
  u = s1.decode('ascii')  # decode them into a unicode string
  s2 = u.encode('utf16')  # encode it as UTF-16 bytes
  outfile.write(s2)       # write them to a binary file, for example

[In this case, I know that s1 is ascii-encoded, because I typed in the
letters "shchi" and I know that those are all ascii characters, and Python
and my computer both handle ASCII just fine by default.  But if you
*don't* know the encoding of s1, it's in general not really possible
to find out.  You can make a pretty good heuristic guess, sometimes.]

Python only supports a few encodings out of the box.  KOIR-8, the one
you mentioned, apparently isn't one of them.

## Jason Orendorff    http://www.jorendorff.com/




More information about the Python-list mailing list