WTF? Printing unicode strings

Robert Kern robert.kern at gmail.com
Thu May 18 19:35:07 EDT 2006


Ron Garret wrote:

> I'm using an OS X terminal to ssh to a Linux machine.

Click on the "Terminal" menu, then "Window Settings...". Choose "Display" from
the combobox. At the bottom you will see a combobox title "Character Set
Encoding". Choose "Unicode (UTF-8)".

> But what about this:
> 
>>>>f2=open('foo','w')
>>>>f2.write(u'\xFF')
> 
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in 
> position 0: ordinal not in range(128)
> 
> That should have nothing to do with my terminal, right?

Correct, that is a different problem. f.write() expects a string of bytes, not a
unicode string. In order to convert unicode strings to byte strings without an
explicit .encode() method call, Python uses the default encoding which is
'ascii'. It's not easily changeable for a good reason. Your modules won't work
on anyone else's machine if you hack that setting.

> I just found http://www.amk.ca/python/howto/unicode, which seems to be 
> enlightening.  The answer seems to be something like:
> 
> import codecs
> f = codecs.open('foo','w','utf-8')
> 
> but that seems pretty awkward.

<shrug> About as clean as it gets when dealing with text encodings.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco




More information about the Python-list mailing list