warning for google api users

William wverheul at gmail.com
Wed Feb 22 04:30:40 EST 2006


Isn't this because the index that the api uses is (a lot) older than
the index used by www.google.com? total results are always estimated,
so they are not reliable (seen the variance)

Gabriel B. schreef:

> the google webservices (aka google API) is not even close for any kind
> of real use yet
>
> if you search for the same term 10 times, you get 3 mixed totals. 2
> mixed result order. and one or two "502 bad gateway"
>
> i did an extensive match agains the API and the regular search
> service. the most average set of results:
>
> results 1-10; total: 373000
> results 11-20; total: 151000
> results 21-30; total: 151000
> results 31-40; total: 373000
> results 41-50; total: 373000
> results 51-60; total: 373000
> results 61-70; total: 151000
> ( 502 bad gateway. retry)
> results 71-80; total: 373000
> results 81-90; total: 151000
> ( 502 bad gateway. retry)
> results 91-100; total: 373000
>
> on the regular google search, total:  2,050,000 (for every page, of
> course)
>
> besides that, the first and third result on the regular google search,
> does not apear in the 100 results from the API in this query, but this
> is not average, more like 1 chance in 10 :-/
>
> So, no matter how much google insists that this parrot is sleeping,
> it's simply dead.
>
>
> now, what i presume that is happening, is that they have a dozen of
> machine pools, and each one has a broken snapshot of the production
> index (probably they have some process to import the index and or it
> explode in some point or they simply kill it after some time). and
> they obviously don't run that process very often.
>
> Now... anyone has some implementation of pygoogle.py that scraps the
> regular html service instead of using SOAP? :)
> 
> Gabriel B.




More information about the Python-list mailing list