[Tutor] encodings

Magnus Lyckå magnus@thinkware.se
Tue Jul 1 07:49:01 2003


I suppose you meant for this to go to the mailing list. You seem
to have sent it to me.

It seems to me that GB2312 is a 16-bit encoding. As far as I know,
Python handles 8-bit strings and Unicode. I don't know how to handle
non-Unicode multi-byte strings in Python. Perhaps it can't be done?

You will probably find more help at comp.lang.python or in the i18n-sig
mailing list. See http://mail.python.org/mailman/listinfo/i18n-sig

At 09:25 2003-06-28 +0800, you wrote:
>the new problem
>i'm in default chinese gb2312 charset
>in ./python23/lib/encoding/ no found gb2312 encode/decode
>so i get gb2312 charset map from 
>ftp://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/GB/GB2312.TXT
>exec /Python23/Tools/Scripts/gencodec.py get gb2312.py
>put gb2312.py into /python23/lib/encoding/
>in IDLE 0.8
> >>> import codecs
> >>> codecs.lookup('gb2312')
>(<bound method Codec.encode of <encodings.gb2312.Codec instance at 
>0x01A073F0>>, <bound method Codec.decode of <encodings.gb2312.Codec 
>instance at 0x01A07FD0>>, <class encodings.gb2312.StreamReader at 
>0x010F04E0>, <class encodings.gb2312.StreamWriter at 0x010F04B0>)
>
>look fine!
> >>> text='???' #chinese char
> >>> text.decode('gb2312')
>Traceback (most recent call last):
>   File "<pyshell#28>", line 1, in ?
>     text.decode('gb2312')
>   File "C:\Python23\lib\encodings\gb2312.py", line 22, in decode
>     return codecs.charmap_decode(input,errors,decoding_map)
>UnicodeDecodeError: 'charmap' codec can't decode byte 0xbd in position 0: 
>character maps to <undefined>
>
>why?
>other
> >>> text=u'abcd'
> >>> text.encode('gb2312')
>Traceback (most recent call last):
>   File "<pyshell#32>", line 1, in ?
>     text.encode('gb2312')
>   File "C:\Python23\lib\encodings\gb2312.py", line 18, in encode
>     return codecs.charmap_encode(input,errors,encoding_map)
>UnicodeEncodeError: 'charmap' codec can't encode characters in position 
>0-3: character maps to <undefined>
>
>What should I do ?

--
Magnus Lycka (It's really Lyck&aring;), magnus@thinkware.se
Thinkware AB, Sweden, www.thinkware.se
I code Python ~ The Agile Programming Language