how to transfer my utf8 code saved in a file to gbk code

John Machin sjmachin at lexicon.net
Sun Jun 7 11:25:15 EDT 2009


On Jun 7, 10:55 pm, higer <higerinbeij... at gmail.com> wrote:
> My file contains such strings :
> \xe6\x97\xa5\xe6\x9c\x9f\xef\xbc\x9a

Are you sure? Does that occupy 9 bytes in your file or 36 bytes?

>
> I want to read the content of this file and transfer it to the
> corresponding gbk code,a kind of Chinese character encode style.
> Everytime I was trying to transfer, it will output the same thing no
> matter which method was used.
>  It seems like that when Python reads it, Python will taks '\' as a
> common char and this string at last will be represented as "\\xe6\\x97\
> \xa5\\xe6\\x9c\\x9f\\xef\\xbc\\x9a" , then the "\" can be 'correctly'
> output,but that's not what I want to get.
>
> Anyone can help me?
>

try this:

utf8_data = your_data.decode('string-escape')
unicode_data = utf8_data.decode('utf8')
# unicode derived from your sample looks like this 日期: is that what
you expected?
gbk_data = unicode_data.encode('gbk')

If that "doesn't work", do three things:
(1) give us some unambiguous hard evidence about the contents of your
data:
e.g. # assuming Python 2.x
your_data = open('your_file.txt', 'rb').read(36)
print repr(your_data)
print len(your_data)
print your_data.count('\\')
print your_data.count('x')

(2) show us the source of the script that you used
(3) Tell us what "doesn't work" means in this case

Cheers,
John





More information about the Python-list mailing list