A question about unicode() function

Mon Jan 1 02:07:08 EST 2007

Hi,

I changed my codes to:

#!/usr/bin/python
#Filename: test.py
#Modified: 2007-01-01

import cPickle as p
import urllib
import htmllib
import re
import sys

funUrlFetch =  lambda url:urllib.urlopen(url).read()

objUrl = raw_input('Enter the Url:')
content = funUrlFetch(objUrl)
content = content.encode('gb2312','ignore')
print content
content.close()

I used "ignore" to deal with the data lose, but it still caused a
error:

C:\WINDOWS\system32\cmd.exe /c python tianya.py
Enter the Url:http://www.tianya.cn
Traceback (most recent call last):
  File "tianya.py", line 17, in ?
    content = content.encode('gb2312','ignore')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xbb in position
88: ordinal not in range(128)
shell returned 1
Hit any key to close this window...

My python version is 2.4, Does it have some problems with asian
encoding support?

Thanks!

On Dec 31 2006, 9:30 pm, "Felipe Almeida Lessa"
<felipe.le... at gmail.com> wrote:
> On 31 Dec 2006 05:20:10 -0800, JTree <east... at gmail.com> wrote:
>
> > def funUrlFetch(url):
> >     lambda url:urllib.urlopen(url).read()This function only creates a lambda function (that is not used or
> assigned anywhere), nothing more, nothing less. Thus, it returns None
> (sort of "void") no matter what is its argument. Probably you meant
> something like
>
> def funUrlFetch(url):
>     return urllib.urlopen(url).read()
>
> or
>
> funUrlFetch = lambda url:urllib.urlopen(url).read()
>
> > objUrl = raw_input('Enter the Url:')
> > content = funUrlFetch(objUrl)content gets assigned None. Try putting "print content" before the unicode line.
>
> > content = unicode(content,"gbk")This, equivalent to unicode(None, "gbk"), leads to
>
> > TypeError: coercing to Unicode: need string or buffer, NoneType foundNone's are not strings nor buffers, so unicode() complains.
> 
> See ya,
> 
> --
> Felipe.