help wanted regarding displaying Japanese characters in a GUI using QT and python
Serge Orlov
Serge.Orlov at gmail.com
Thu Apr 20 07:27:25 EDT 2006
prats wrote:
> sorry I did not correctly read your point. I works fine. Thanks for
> your help.
> I have one more query. It was said that the text I was supposed to show
> was written using "ISO-2022-JP" charset. But It didn't when I decoded
> it using that charset. But it worked fine with the "shift-jis"
> encoding. Is it the default charset used by python i.e. I mean to say
> bytes would be by default "shift-jis"?
No, the default charset in python is ascii. There is no absolutely
reliable way to find out the encoding of arbitrary bytes. But if you
have more than ten bytes and you know some properties of the text (like
you're sure your text contains only English and Japanese) then the
first thing you can do is to rule out invalid encodings:
def valid_en_jp_encodings(bytes):
try:
bytes.decode("ascii")
return ["ascii"]
except UnicodeDecodeError:
pass
encodings = "utf-8", "shift-jis", "iso-2022-jp", "euc-jp"
valid = []
for encoding in encodings:
try:
bytes.decode(encoding)
valid.append(encoding)
except UnicodeDecodeError:
pass
return valid
If this function returns a list with only one item you're lucky. If it
returns more than one item things are getting more complicated. You can
try to use http://chardet.feedparser.org/ to guess encoding or you can
present list of valid encodings to the user and let him/her make a
choice. There is also possibility that this function returns an empty
list, you will need to display a error message in such case.
More information about the Python-list
mailing list