How to convert between Japanese coding systems?

Peter Otten __peter__ at web.de
Thu Feb 19 02:51:03 EST 2009


Dietrich Bollmann wrote:

> I get the strings (which actually are emails) from a server on the
> internet with:
> 
>   import urllib
>   server = urllib.urlopen(serverURL, parameters)
>   email = server.read()
> 
> The coding systems are given in the response string:
> 
> Example:
> 
> email = '''[...]
> Subject:
> =?UTF-8?Q?romaji=E3=81=B2=E3=82=89=E3=81=8C=E3=81=AA=E3=82=AB=E3=82=BF?=
> =?UTF-8?Q?=E3=82=AB=E3=83=8A=E6=BC=A2=E5=AD=97?=
> [...]
> Content-Type: text/plain; charset=EUC-JP
> [...]
> Content-Transfer-Encoding: base64
> [...]
> 
> cm9tYWpppNKk6aSspMqlq6W/paulyrTBu/oNCg0K
> 
> '''

Is that an email? Maybe you can get it in a format that is supported by the
email package in the standard library.

> The only problem is that I could not find any standard functionality to
> convert between different Japanese coding systems.

Then you didn't look hard enough:
 
>>> s = "会社概要".decode("utf8") # i have no idea what that means
>>> s.encode("iso-2022-jp")
'\x1b$B2q<R35MW\x1b(B'
>>> s.encode("euc-jp")
'\xb2\xf1\xbc\xd2\xb3\xb5\xcd\xd7'
>>> s.encode("sjis")
'\x89\xef\x8e\xd0\x8aT\x97v'

See also http://www.amk.ca/python/howto/unicode

Peter



More information about the Python-list mailing list