Can't get the real contents form page in internet as the tag "no-chche"

John J. Lee jjl at pobox.com
Thu Mar 23 17:40:06 EST 2006


"dongdong" <dongdonglove8 at hotmail.com> writes:

> oh~~~! offer my  thanks to Tim Roberts  and all persons above!
>  I see now, it's the different url causes!
>  contents can only be got from the later (real ) url.
>  I made a mistick not to look at the different urls  taking effect.

If you use ClientCookie.urlopen() in place of urllib2.urlopen(), it
will handle Refreshes and HTTP-EQUIV for you transparently.

Actually, you have to explicitly ask for that functionality:

import ClientCookie
opener = ClientCookie.build_opener(ClientCookie.HTTPEquivProcessor,
                                   ClientCookie.HTTPRefreshProcessor,
                                   )
ClientCookie.install_opener(opener)

print ClientCookie.urlopen(url).read()


If you want to do even less of this stuff "by hand", class Browser
from module mechanize is a subclass of the class of "opener" above,
but behaves much more like a web browser in various ways.  Still
alpha, but very near now to stable release.


FWIW, you can also use ClientCookie.HTTPRefreshProcessor,
ClientCookie.HTTPEquivProcessor etc. with Python 2.4's urllib2, as
long as you follow the instructions under the heading "Notes about
ClientCookie, urllib2 and cookielib" in the ClientCookie README file
(specifically, if you want to use ClientCookie.RefreshProcessor with
Python 2.4's urllib2, you must also use
ClientCookie.HTTPRedirectHandler).


John




More information about the Python-list mailing list