changing string encoding to different charset?
Philip Semanchuk
philip at semanchuk.com
Sun Dec 14 10:07:36 EST 2008
On Dec 14, 2008, at 9:21 AM, Daniel Woodhouse wrote:
> Is it possible to re-encode a string to a different character set in
> python? To be more specific, I want to change a text file encoded in
> windows-1251 to UTF-8.
> I've tried using string.encode, but get the error:
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position
> 0:
> ordinal not in range(128)
Without seeing your code, I can't be sure, but I suspect that first
you need to decode the file to Unicode.
# Untested --
s = file("in.txt").read()
s = s.decode("win-1251") # Might be "cp1251" instead
assert(isinstance(s, unicode))
s = s.encode("utf-8")
file("out.txt", "w").write(s)
More information about the Python-list
mailing list