A question about unicode() function

Tim Roberts timr at probo.com
Mon Jan 1 02:29:05 EST 2007


"JTree" <eastera at gmail.com> wrote:
>
>Hi,all
>     I encountered a problem when using unicode() function to fetch a
>webpage, I don't know why this happenned.
>     My codes and error messages are:
>
>
>Code:
>#!/usr/bin/python
>#Filename: test.py
>#Modified: 2006-12-31
>
>import cPickle as p
>import urllib
>import htmllib
>import re
>import sys
>
>def funUrlFetch(url):
>    lambda url:urllib.urlopen(url).read()
>
>objUrl = raw_input('Enter the Url:')
>content = funUrlFetch(objUrl)
>content = unicode(content,"gbk")
>print content
>content.close()

Once you fix the lambda, as Felipe described, there's another issue here.
You are telling the unicode function that the string you're passing it is
an 8-bit string encoded as gbk.  How do you know that?  In your specific
example, www.msn.com, I can guarantee it will produce the wrong results:
www.msn.com is encoded in UTF-8.
-- 
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.



More information about the Python-list mailing list