encoding latin1 to utf-8

J. Clifford Dyer jcd at sdf.lonestar.org
Mon Sep 10 08:48:24 EDT 2007


On Mon, Sep 10, 2007 at 12:25:46PM -0000, Harshad Modi wrote regarding encoding latin1 to utf-8:
> Path: news.xs4all.nl!newsspool.news.xs4all.nl!transit.news.xs4all.nl!newsgate.cistron.nl!xs4all!news.glorb.com!postnews.google.com!22g2000hsm.googlegroups.com!not-for-mail
> 
> hello ,
>  I make one function for encoding latin1 to utf-8. but i think it is
> not work proper.
> plz guide me.
> 
> it is not get proper result . such that i got "Belgi???" using this
> method, (Belgium)  :
> 
> import codecs
> import sys
> # Encoding / decoding functions
> def encode(filename):
>  file = codecs.open(filename, encoding="latin-1")
>  data = file.read()
>  file = codecs.open(filename,"wb", encoding="utf-8")
>  file.write(data)
> 
> file_name=sys.argv[1]
> encode(file_name)

Some tips to help you out. 

1.  Close your filehandles when you're done with them.
2.  Don't shadow builtin names.  Python uses the name file, and binding it to your own function can have ugly side effects that manifest down the road.

So perhaps try the following:

import codecs

def encode(filename):
	read_handle = codecs.open(filename, encoding='latin-1')
	data = read_handle.read()
	read_handle.close()
	write_handle = codecs.open(filename, 'wb', encoding='utf-8')
	write_handle.write(data)
	write_handle.close()

For what it's worth though, I couldn't reproduce your problem with either your code or mine.  This is not too surprising as all the ascii characters are encoded identically in utf-8 and latin-1.  So your program should output exactly the same file as it reads, if the contents of the file just read "Belgium"

Cheers,
Cliff



More information about the Python-list mailing list