Unicode problems, yet again

Sat Apr 23 22:57:53 EDT 2005

On Sun, 24 Apr 2005 03:15:02 +0200, Ivan Voras
<ivoras at something.ortheother> wrote:

>I have a string fetched from database, in iso8859-2, with 8bit 
>characters,

"8bit characters"?? Maybe you did once, or you thought you did, but
what you have now is a Unicode string, and socket.write() is expecting
an ordinary string.

> and I'm trying to send it over the network, via a socket:
>
>   File "E:\Python24\lib\socket.py", line 249, in write
>     data = str(data) # XXX Should really reject non-string non-buffers
>UnicodeEncodeError: 'ascii' codec can't encode character u'\u0161' in 
>position 123: ordinal not in range(128)

Like it says, you have passed it a *UNICODE* string that has u'\u0161'
(the small s with caron) at position 123.

>
>The other end knows it should expect this encoding, so how to send it?
>

If the other end wants an encoding, then you should *encode* it, like
this:

>>> us = u'\u0161'
>>> s = us.encode('iso8859_2')
>>> s
'\xb9'
>>> str(us)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0161' in
position 0: ordinal not in range(128)
>>> str(s)
'\xb9'
>>> # looks like socket.write() might be happier with this.

>(Does anyone else feel that python's unicode handling is, well... 
>suboptimal at least?)

Your posting gives no evidence for such a conclusion.