Unicode (UTF8) in dbhas on 2.5

Jerry Hill malaclypse2 at gmail.com
Tue Oct 21 11:07:54 EDT 2008


On Tue, Oct 21, 2008 at 10:16 AM, Yves Dorfsman <yves at zioup.com> wrote:
> My terminal is setup in UTF-8, and... It did print correctly. I expected
> that by setting coding: utf-8, all the I/O functions would do the encoding
> for me, because if they don't then I, and everybody who writes a script, will
> need to subclass every single I/O class (ok, except for print !).

No, you don't.  You just need to use the tools provided for you in the
standard library, like this:

import codecs
in_file = codecs.open('my_utf8_file.txt', 'r', 'utf8')

Now your file full of utf8 encoded bytes will be automatically
transformed into unicode strings as you read them in.  You can do the
same thing on the output side (obviously, using mode 'w' instread of
'r').

If you need to wrap things other than files, the codecs module has the
tools to do that too.

-- 
Jerry



More information about the Python-list mailing list