Trouble with unicode

M.-A. Lemburg mal at lemburg.com
Tue May 15 10:10:00 EDT 2001


Charlie Clark wrote:
> 
> >First you should check which encoding your Unicode file uses
> >(e.g. sometimes Unicode refers to UTF-16 or just UTF-16-LE). Then
> >you should read the file using codecs.open():
> Actually I now know that it is latin-1
> 
> ># replace encoding with 'utf-16' or 'utf-16-le' or 'utf-16-be'
> >f = codecs.open(filename, 'rb', encoding)
> >contents = f.read()
> >f.close()
> This is exactly what I was looking for. The only thing is having to use the
> codec to read the file. I had expected something like
> f = open(filename, "r")
> contents = f.read()
> contents = codecs.decode(contents, encoding)

codecs.open() places a codec wrapper around the file object
which provides (more or less) seemless encoding/decoding.

You can write Unicode using the .write() method and the
wrapped file object will encode it using the given encoding.
.read() will do the same in the other direction, i.e. it returns
Unicode.
 
> or should I expect to start opening files with "rb" and an argument in the
> future? I like the way Python encourages a standard way of doing things.

We are thinking about enhancing the builtin open() to also
handle encoded files. Basically, the codecs.open() mechanism
will be replacing the open() one in case an encoding is given.
 
> >Now you can convert the Unicode object contents into a plain
> >string using some other encoding, e.g. Latin-1, and then
> >write it back to a text file:
> >
> would do, if that was all I was doing with it. But it works fine as it is.
> 
> Thanx a lot!!!
> 
> Charlie Clark
> 
> --
> Charlie Clark
> Helmholtzstr. 20
> Düsseldorf
> D- 40215

Schöne Grüße von der Düsselstraße ;-)

> Tel: +49-211-938-5360
> http://www.begeistert.org

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/




More information about the Python-list mailing list