[Tutor] Unicode (utf-16-le) (fwd)

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Mon Aug 25 11:40:08 EDT 2003


Hi Derek,


Let me redirect your question to Python-Tutor.  This will help keep
everyone in the loop, and it also lets someone else answer your question
if I'm too lazy to answer.  *grin*


If our file is in 'utf-16-le' format, we may want to use the StreamReaders
in:

    http://www.python.org/doc/lib/module-codecs.html

This 'codecs' module implements its own version of open() that knows about
the standard encodings, so hopefully:

###
import codecs
f = codecs.open('my-utf-16-le-encoded-file.utf16le',
                'r',
                'utf16-le')
###

should be enough to decode utf16-le files on the fly.




---------- Forwarded message ----------
Date: Mon, 25 Aug 2003 16:26:44 +1000
From: Derek at leder <derek at leder.com.au>
To: Danny Yoo <dyoo at hkn.eecs.berkeley.edu>
Subject: Re: [Tutor] Unicode (utf-16-le)

Thanks Danny,

At 25/08/2003 09:05 AM, you wrote:
>On Sat, 23 Aug 2003, Derek at leder wrote:
>
>> Does anyone know of a module for converting utf-16-le to ascii?
>> before I write one.
>
>Hi Derek,
>
>Strings support other character sets through the 'encode()' and 'decode()'
>methods.  For example:
>
>###
>>>> s = 'hello world'
>>>> utf_txt = s.encode('utf-16')
>>>> utf_txt
>'\xff\xfeh\x00e\x00l\x00l\x00o\x00 \x00w\x00o\x00r\x00l\x00d\x00'

This is almost there, if you notice there are hex characters mixed up with ascii
ie \xff and \xfe are hex characters but the 'h' in \xfeh is ascii.

>>>> utf_txt.decode('utf-16')
>u'hello world'

While this works in immediate mode, it does not work when run from a file.??

any other clues :-)

>###
>
>Here's a partial list of the character encodings that Python supports:
>
>    http://www.python.org/doc/lib/node126.html
>
>According to that list, 'utf-16-le' is a codec that it can handle, so you
>should be in good shape.  *grin*
>
>
>Good luck to you!




More information about the Tutor mailing list