Trouble with unicode

Charlie Clark charlie at begeistert.org
Tue May 15 09:02:46 EDT 2001


>First you should check which encoding your Unicode file uses
>(e.g. sometimes Unicode refers to UTF-16 or just UTF-16-LE). Then
>you should read the file using codecs.open():
Actually I now know that it is latin-1

># replace encoding with 'utf-16' or 'utf-16-le' or 'utf-16-be'
>f = codecs.open(filename, 'rb', encoding)
>contents = f.read()
>f.close()
This is exactly what I was looking for. The only thing is having to use the 
codec to read the file. I had expected something like
f = open(filename, "r")
contents = f.read()
contents = codecs.decode(contents, encoding)

or should I expect to start opening files with "rb" and an argument in the 
future? I like the way Python encourages a standard way of doing things.

>Now you can convert the Unicode object contents into a plain
>string using some other encoding, e.g. Latin-1, and then
>write it back to a text file:
>
would do, if that was all I was doing with it. But it works fine as it is.

Thanx a lot!!!

Charlie Clark


-- 
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
http://www.begeistert.org





More information about the Python-list mailing list