How to convert between Japanese coding systems?
Peter Otten
__peter__ at web.de
Thu Feb 19 02:51:03 EST 2009
Dietrich Bollmann wrote:
> I get the strings (which actually are emails) from a server on the
> internet with:
>
> import urllib
> server = urllib.urlopen(serverURL, parameters)
> email = server.read()
>
> The coding systems are given in the response string:
>
> Example:
>
> email = '''[...]
> Subject:
> =?UTF-8?Q?romaji=E3=81=B2=E3=82=89=E3=81=8C=E3=81=AA=E3=82=AB=E3=82=BF?=
> =?UTF-8?Q?=E3=82=AB=E3=83=8A=E6=BC=A2=E5=AD=97?=
> [...]
> Content-Type: text/plain; charset=EUC-JP
> [...]
> Content-Transfer-Encoding: base64
> [...]
>
> cm9tYWpppNKk6aSspMqlq6W/paulyrTBu/oNCg0K
>
> '''
Is that an email? Maybe you can get it in a format that is supported by the
email package in the standard library.
> The only problem is that I could not find any standard functionality to
> convert between different Japanese coding systems.
Then you didn't look hard enough:
>>> s = "会社概要".decode("utf8") # i have no idea what that means
>>> s.encode("iso-2022-jp")
'\x1b$B2q<R35MW\x1b(B'
>>> s.encode("euc-jp")
'\xb2\xf1\xbc\xd2\xb3\xb5\xcd\xd7'
>>> s.encode("sjis")
'\x89\xef\x8e\xd0\x8aT\x97v'
See also http://www.amk.ca/python/howto/unicode
Peter
More information about the Python-list
mailing list