Python Google Server

Fuzzyman fuzzyman at gmail.com
Tue Apr 5 09:16:34 EDT 2005


vegetax wrote:
> it works on opera and firefox on linux, but you cant search in the
cached
> google! it would be more usefull if you could somehow search "only"
in the
> cache instead of putting the straight link. maybe you could put a
magic url
> to search in the cache, like search:"search terms"
>

Thanks for the report. I've also tried it with firefox on windows.

Yeah - google search results aren't cached !! Perhaps anything in a
google domain ought to pass straight through. That could be done by
testing the domain and using urllib2 to fetch the page.

Have just tested the following which works.

Add the follwoing two lines to the start of the code :

import urllib2
txheaders = { 'User-agent' : 'Mozilla/4.0 (compatible; MSIE 6.0;
Windows NT 5.1; SV1; .NET CLR 1.1.4322)' }

Then change the start of the send_head method to this :

    def send_head(self):
        """Only GET implemented for this.
        This sends the response code and MIME headers.
        Return value is a file object, or None.
        """
        print 'Request :', self.path # traceback to sys.stdout
        url_tuple = urlparse.urlparse(self.path)
        url = url_tuple[2]
        domain = url_tuple[1]
        if domain.find('.google.') != -1:   # bypass the cache for
google domains
            req = urllib2.Request(self.path, None, txheaders)
            return urllib2.urlopen(req)


> fuzzyman at gmail.com wrote:
>
> > I've hacked together a 'GoogleCacheServer'. It is based on
> > SimpleHTTPServer. Run the following script (hopefully google groups
> > won't mangle the indentation) and set your browser proxy settings
to
> > 'localhost:8000'. It will let you browse the internet using
google's
> > cache. Obviously you'll miss images, javascript, css files, etc.
> > 
> > See the world as google sees it !
[snip..]




More information about the Python-list mailing list