any one used googles api?
Alan James Salmoni
alan_salmoni at yahoo.com
Mon Dec 22 16:42:48 EST 2003
Bill Sneddon <bsneddonNOspam at yahoo.com> wrote in message news:<brv88k$kkq$1 at ngspool-d02.news.aol.com>...
> Has anyone used googles API who would be will to share
> a simple example. I have been wanting to play around with
> SOAP for a while and this looks like a place to start.
>
> I am going to mess with it in ASP when our IT guys set it up
> for me.
>
>
> I like Python but have not done anything with SOAP yet.
>
> http://www.google.com/apis/
Hi Bill,
Like another poster suggested, I would recommend using PyGoogle. I
used it to retrieve materials for a number of my own experiments and
found that it was rather good - more effective that just spoofing a
query (which I used to do). Here's an example that I used to retrieve
a list of returns (document titles and "snippets") from the Google:
btw - a "concern" is just a search string for this study.
# program to retrieve loads of results from Google.
# (c) 2002 Alan James Salmoni, HCI Group, Cardiff University
# reminder: 10 concerns, 30 summaries each!
# concerns are strings in a file called "concerns.txt"
import google, os, os.path
google.LICENSE_KEY = 'heeheenottellingyou!' # must get your own!
fin = open('concerns.txt','r')
fout = open('results.exp6.txt','w')
deadpage = '<html><body>No Page</body></html>'
# this snippet gets 100 results for each of 10 concerns
#titles = summaries = urls = []
for i in range(11):
fin.readline()
for i in range(0,9):
concernstring = fin.readline()
k = 0
data = google.doGoogleSearch(concernstring, 0+k,
10,1,'',1,'lang_en','','')
fout.write(concernstring) # record the concern on the file
fout.write(str(data.meta.estimatedTotalResultsCount)+'\n') #
record # results
for k in range(0,100,10):
data = google.doGoogleSearch(concernstring, 0+k,
10,1,'',1,'lang_en','','')
# search for "concernstring" indeces 0-100, filtering out
similar results, no restrictions,
# safesearch ON, in english and no input or output encoding!
for j in range(10):
#print 'concern: '+str(i)+' block: '+str(k)+' result:
'+str(j)
if (j+k) < data.meta.estimatedTotalResultsCount:
if not os.path.isdir(str(i)):
os.mkdir(str(i))
<bits snipped out - nothing to do with Google - honest!)
try:
fout.write(data.results[j].URL+'\n')
except:
fout.write('NONE\n')
aliveflag = False
if aliveflag:
page = google.doGetCachedPage(data.results[j].URL)
print str(i)+':'+str(j+k+1)+": retrieved page
"+data.results[j].URL
else:
page = deadpage
print str(j+1)+': Page DEAD '+data.results[j].URL
fout2.write(page)
fout2.close()
fout.close()
fin.close()
This line (data = google.doGoogleSearch(concernstring, 0+k,
10,1,'',1,'lang_en','','') sends a search request (held in the
variable "concernstring" and asks for a list of results. The rank
starts from 0 + k and I've asked for 10. Not sure what the next bit
is, but after that, I'm requesting English language docs. Not sure
about the 2 other bits, but the PyGoogle module has the information
you need!
There's lots of other stuff like google.doGetCachedPage(URL) which
gets the entire page from the Google cache,
The code is a bit hairy (yeah, quick'n;dirty scripting - dontcha love
it!), and I haven't looked at it for a while (over a year), but it
does make sense to me! (another one up for Python!). Let me know if
you have any questions.
Alan James Salmoni
SalStat Statistics
http://salstat.sourceforge.net
More information about the Python-list
mailing list