Unicode

Dave Angel d at davea.name
Mon Dec 17 13:07:46 EST 2012


On 12/17/2012 12:43 PM, Anatoli Hristov wrote:
>> Hi,
>> I don't know, what the product ID would look like, for this page, but
>> assuming, the catalog pages are also utf-8 encoded as well as the
>> error page I get, it should work ok; cf.:
> You are right, I get it work on Windows too, but not in Linux. I
> changed the codec of linux, but still I don't get it
>
> Here is what I get from Linux:
>
>>>> import urllib
>>>> opener = urllib.FancyURLopener({})
>>>> ffr = opener.open("http://prf.icecat.biz/index.cgi?product_id=%s;mi=start;smi=product;shopname=openICEcat-url;lang=fr" % (14688538))
>>>> src = ffr.read()
>>>> print src.decode("utf-8")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2122'
> in position 17167: ordinal not in range(256)

I can tell you what's happening, but maybe not how to fix it.

src.decode() is creating a unicode string.  The error is not happening
there.  But when print is used with a unicode string, it has to encode
the data.  And for whatever reason, yours is using latin-1, and you have
a character in there which is not in the latin-1 encoding.

My python 2.7 uses utf-8 everywhere (on Linux Ubuntu 11.04).


-- 

DaveA




More information about the Python-list mailing list