How to convert between Japanese coding systems?

Dietrich Bollmann diresu at web.de
Thu Feb 19 01:28:12 EST 2009


Hi,

Are there any functions in python to convert between different Japanese
coding systems?

I would like to convert between (at least) ISO-2022-JP, UTF-8, EUC-JP
and SJIS.  I also need some function to encode / decode base64 encoded
strings.

I get the strings (which actually are emails) from a server on the
internet with:

  import urllib
  server = urllib.urlopen(serverURL, parameters)
  email = server.read()

The coding systems are given in the response string:

Example:

email = '''[...]
Subject:
=?UTF-8?Q?romaji=E3=81=B2=E3=82=89=E3=81=8C=E3=81=AA=E3=82=AB=E3=82=BF?=
=?UTF-8?Q?=E3=82=AB=E3=83=8A=E6=BC=A2=E5=AD=97?=
[...]
Content-Type: text/plain; charset=EUC-JP
[...]
Content-Transfer-Encoding: base64
[...]

cm9tYWpppNKk6aSspMqlq6W/paulyrTBu/oNCg0K

'''

My idea is to first parse the 'email' string and to extract the email
body as well as the values of the 'Subject: ', the 'Content-Type: ' and
the 'Content-Transfer-Encoding: ' attributes and to after use them to
convert them to some other coding system:

Something in the lines of:

  (subject, contentType, contentTransferEncoding, content) =
parseEmail(email)

  to = 'utf-8'
  subjectUtf8 = decodeSubject(subject, to)

  from = contentType
  to = 'utf-8'
  contentUtf8 = convertCodingSystem(decodeBase64(content), from, to)

The only problem is that I could not find any standard functionality to
convert between different Japanese coding systems.

Thanks,

Dietrich Bollmann






More information about the Python-list mailing list