any one used googles api?

Alan James Salmoni alan_salmoni at yahoo.com
Mon Dec 22 16:42:48 EST 2003


Bill Sneddon <bsneddonNOspam at yahoo.com> wrote in message news:<brv88k$kkq$1 at ngspool-d02.news.aol.com>...
> Has anyone used googles API who would be will to share
> a simple example.  I have been wanting to play around with
> SOAP for a while and this looks like a place to start.
> 
> I am going to mess with it in ASP when our IT guys set it up
> for me.
> 
> 
> I like Python but have not done anything with SOAP yet.
> 
> http://www.google.com/apis/

Hi Bill,

Like another poster suggested, I would recommend using PyGoogle. I
used it to retrieve materials for a number of my own experiments and
found that it was rather good - more effective that just spoofing a
query (which I used to do). Here's an example that I used to retrieve
a list of returns (document titles and "snippets") from the Google:
btw - a "concern" is just a search string for this study.

# program to retrieve loads of results from Google.
# (c) 2002 Alan James Salmoni, HCI Group, Cardiff University

# reminder: 10 concerns, 30 summaries each!
# concerns are strings in a file called "concerns.txt"
import google, os, os.path

google.LICENSE_KEY = 'heeheenottellingyou!' # must get your own!
fin = open('concerns.txt','r')
fout = open('results.exp6.txt','w')
deadpage = '<html><body>No Page</body></html>'


# this snippet gets 100 results for each of 10 concerns
#titles = summaries = urls = [] 



for i in range(11):

    fin.readline()
for i in range(0,9):
    concernstring = fin.readline()

    k = 0

    data = google.doGoogleSearch(concernstring, 0+k,
10,1,'',1,'lang_en','','')
    fout.write(concernstring) # record the concern on the file
    fout.write(str(data.meta.estimatedTotalResultsCount)+'\n') #
record # results
    for k in range(0,100,10):
        data = google.doGoogleSearch(concernstring, 0+k,
10,1,'',1,'lang_en','','')
        # search for "concernstring" indeces 0-100, filtering out
similar results, no restrictions,
        # safesearch ON, in english and no input or output encoding!
        for j in range(10):

            #print 'concern: '+str(i)+'  block: '+str(k)+'  result:
'+str(j)

            if  (j+k) < data.meta.estimatedTotalResultsCount:

                if not os.path.isdir(str(i)):

                    os.mkdir(str(i))

<bits snipped out - nothing to do with Google - honest!)
                try:
                    fout.write(data.results[j].URL+'\n')

                except:

                    fout.write('NONE\n')
                    aliveflag = False

                if aliveflag:

                    page = google.doGetCachedPage(data.results[j].URL)

                    print str(i)+':'+str(j+k+1)+": retrieved page
"+data.results[j].URL

                else:

                    page = deadpage

                    print str(j+1)+': Page DEAD '+data.results[j].URL

                fout2.write(page)

                fout2.close()

fout.close()
fin.close()


This line (data = google.doGoogleSearch(concernstring, 0+k,
10,1,'',1,'lang_en','','') sends a search request (held in the
variable "concernstring" and asks for a list of results. The rank
starts from 0 + k and I've asked for 10. Not sure what the next bit
is, but after that, I'm requesting English language docs. Not sure
about the 2 other bits, but the PyGoogle module has the information
you need!

There's lots of other stuff like google.doGetCachedPage(URL) which
gets the entire page from the Google cache,

The code is a bit hairy (yeah, quick'n;dirty scripting - dontcha love
it!), and I haven't looked at it for a while (over a year), but it
does make sense to me! (another one up for Python!). Let me know if
you have any questions.

Alan James Salmoni
SalStat Statistics
http://salstat.sourceforge.net




More information about the Python-list mailing list