help wanted regarding displaying Japanese characters in a GUI using QT and python

Serge Orlov Serge.Orlov at gmail.com
Thu Apr 20 07:27:25 EDT 2006


prats wrote:
> sorry I did not correctly read your point. I works fine. Thanks for
> your help.
> I have one more query. It was said that the text I was supposed to show
> was written using "ISO-2022-JP" charset. But It didn't when I decoded
> it using that charset. But it worked fine with the "shift-jis"
> encoding. Is it the default charset used by python i.e. I mean to say
> bytes would be by default "shift-jis"?

No, the default charset in python is ascii. There is no absolutely
reliable way to find out the encoding of arbitrary bytes. But if you
have more than ten bytes and you know some properties of the text (like
you're sure your text contains only English and Japanese) then the
first thing you can do is to rule out invalid encodings:

def valid_en_jp_encodings(bytes):
    try:
        bytes.decode("ascii")
        return ["ascii"]
    except UnicodeDecodeError:
        pass
    encodings = "utf-8", "shift-jis", "iso-2022-jp", "euc-jp"
    valid = []
    for encoding in encodings:
        try:
            bytes.decode(encoding)
            valid.append(encoding)
        except UnicodeDecodeError:
            pass
    return valid

If this function returns a list with only one item you're lucky. If it
returns more than one item things are getting more complicated. You can
try to use http://chardet.feedparser.org/ to guess encoding or you can
present list of valid encodings to the user and let him/her make a
choice. There is also possibility that this function returns an empty
list, you will need to display a error message in such case.




More information about the Python-list mailing list