Question concerning Unicode and or Shift-JIS

Eric Brunel eric_brunel at despammed.com
Fri Mar 12 04:05:10 EST 2004


Antioch wrote:
 > Ok, so Im a newb python programmer and I'm trying to create a simple python
 > web-application. The program is simply going to read in pairs of words, parse
 > them into a dictionary file, then randomly display the key and prompt the
 > user for the correct answer. Basically, its a digital flash card system with
 > a modular "dictionary" file.
 >
 > The problem is this: I'm trying to create this program to help me study
 > foregin languages (specifically Japanese at the moment) and when I save the
 > txt file which houses the word pairs, it is automatically encoded into UTF.
 > However, when getting user input, the input it natively sent to the program
 > in Shift-JIS encoding. I downloaded CJKcodecs for python to encode a string
 > into any number of Japanese codings, however the problem is I don't know how
 > to "decode" the UTF and then recode it into Shift-JIS so that I can compare
 > the dictionary values with the input values. OR, I could convert the input
 > from Shift-JIS to UTF, but either way I don't know how to decode any of the
 > codecs. I'm sure theres just some simple function call, but I have been
 > unable to find it.
 >
 > Anyhelp would be appreciated! Thanks =)

If s is a string encoded in UTF-8, converting it in Shift-JIS will be something 
like:

s2 = unicode(s, 'utf-8').encode('shift-jis')

For the reverse:

s = unicode(s2, 'shift-jis').encode('utf-8')

You have to make sure s contains only valid japanese characters or the encoding 
/ decoding to / from Shift-JIS will fail and you'll get a ValueError exception.

For further details, see the unicode function @ 
http://www.python.org/doc/current/lib/built-in-funcs.html#l2h-71 , the decode 
and encode methods on strings @ 
http://www.python.org/doc/current/lib/string-methods.html and the codecs module 
@ http://www.python.org/doc/current/lib/module-codecs.html

HTH
-- 
- Eric Brunel <eric (underscore) brunel (at) despammed (dot) com> -
PragmaDev : Real Time Software Development Tools - http://www.pragmadev.com




More information about the Python-list mailing list