[Tutor] converting encoded symbols from rss feed?

Kent Johnson kent37 at tds.net
Thu Jun 18 23:32:09 CEST 2009


On Thu, Jun 18, 2009 at 4:37 PM, Serdar Tumgoren<zstumgoren at gmail.com> wrote:

> On the above link, the section on "Encoding Unicode Byte Streams" has
> the following example:
>
>>>> u = u"abc\u2013"
>>>> print u
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in
> position 3: ordinal not in range(128)
>>>> print u.encode("utf-8")
> abc–
>
> But when I try the same example on my Windows XP machine (with Python
> 2.5.4), I can't get the same results. Instead, it spits out the below
> (hopefully it renders properly and we don't have encoding issues!!!):
>
> $ python
> Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)] on
> win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> x = u"abc\u2013"
>>>> print x
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "C:\Program Files\Python25\lib\encodings\cp437.py", line 12, in encode
>    return codecs.charmap_encode(input,errors,encoding_map)
> UnicodeEncodeError: 'charmap' codec can't encode character u'\u2013' in position
>  3: character maps to <undefined>
>>>> x.encode("utf-8")
> 'abc\xe2\x80\x93'
>>>> print x.encode("utf-8")
> abcΓÇô

The example is written assuming the console encoding is utf-8. Yours
seems to be cp437. Try this:
C:\Project\Mango> py
Python 2.6.1 (r261:67517, Dec  4 2008, 16:51:00) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.

In [1]: import sys

In [2]: sys.stdout.encoding
Out[2]: 'cp437'

But there is another problem - \u2013 is an em dash which does not
appear in cp437, so even giving the correct encoding doesn't work. Try
this:
In [6]: x = u"abc\u2591"

In [7]: print x.encode('cp437')
------> print(x.encode('cp437'))
abc░


> In a related test, I was unable change the default character encoding
> for the python interpreter from ascii to utf-8. In all cases (cygwin,
> Wing IDE, windows command line), the interpreter reported that I my
> "sys" module does not contain the "setdefaultencoding" method (even
> though this should be part of the module from versions 2.x and above).

sys.defaultencoding is deleted by site.py on python startup.You have
to set the default encoding from within a sitecustomize.py module. But
it's usually better to get a correct understanding of what is going on
and to leave the default encoding alone.

Kent


More information about the Tutor mailing list