changing string encoding to different charset?

Philip Semanchuk philip at semanchuk.com
Sun Dec 14 10:07:36 EST 2008


On Dec 14, 2008, at 9:21 AM, Daniel Woodhouse wrote:

> Is it possible to re-encode a string to a different character set in
> python?  To be more specific, I want to change a text file encoded in
> windows-1251 to UTF-8.
> I've tried using string.encode, but get the error:
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position  
> 0:
> ordinal not in range(128)

Without seeing your code, I can't be sure, but I suspect that first  
you need to decode the file to Unicode.

# Untested --
s = file("in.txt").read()

s = s.decode("win-1251") # Might be "cp1251" instead

assert(isinstance(s, unicode))

s = s.encode("utf-8")

file("out.txt", "w").write(s)





More information about the Python-list mailing list