[Tutor] Accessing the Web using Python (fwd)

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Wed Feb 12 02:59:05 2003


Let me forward to the rest of the list, so that the rest of us know what
the situation is.

(Also, please don't call me Mr. Yoo.  Just "Danny" is fine.  Dear gosh, am
I that old already?  *grin*)

Best of wishes to you!

---------- Forwarded message ----------
Date: Fri, 23 Aug 2002 03:19:35 -0500
From: Henry Steigerwaldt <hsteiger@comcast.net>
To: Danny Yoo <dyoo@hkn.eecs.berkeley.edu>
Subject: Re: [Tutor] Accessing the Web using Python

Mr. Yoo:

I think you are on to something here.

I just realized that after I get the data a couple
of times like a just did earlier tonight, the access then
stops.

I'll bet a program monitors how many times someone
comes into the site. If they do so too often in a short
period of time, then they freeze them out for a time.

I'll bet that is it. I had just finished making a while loop
and trying to access that site over and over and over
again to see if I could get in. It must have looped
through 30 times!

WOW! I'll stop that immediately! I would assume
that is what hackers have done to try and cause
a site to crash.

If this is the problem and it sounds like it is, I'll
just get the data later (or tomorrow) when I can again
get into the site, store it to a file, and THEN
do more work on my program by using the data
stored on my PC.

I'll try your suggestion in a few minutes.

Thanks.

Henry Steigerwaldt

----- Original Message -----
From: "Danny Yoo" <dyoo@hkn.eecs.berkeley.edu>
To: "Henry Steigerwaldt" <hsteiger@comcast.net>
Cc: <tutor@python.org>
Sent: Tuesday, February 11, 2003 10:04 PM
Subject: Re: [Tutor] Accessing the Web using Python


>
>
> On Fri, 23 Aug 2002, Henry Steigerwaldt wrote:
>
>
> > import urllib
> >
> > fwcURL = "http://isl715.nws.noaa.gov/tdl/forecast/fwc.txt"
> >
> > try:
> >    print "Going to Web for data"
> >    fwcall = urllib.urlopen(fwcURL).read()
> >    print "Successful"
> >    print "Will now print all of the data to screen"
> >    print "fwcall = ", fwcall
> > except:
> >    print "Could not obtain data from Web"
> > ______________________________________________
> > Using this code the previous day, I had absolutely no problem
> > getting the data. However yesterday using the same code,
> > most of the time this site could not be accessed at all.
>
> [some text cut]
>
> > I also noticed that a few times I would be able to access the site (i.e.
> > the "Successful" would print to the screen), but each time absolutely
> > NOTHING would be stored in the "fwcall" variable, unlike successful
> > times when all the text information WAS stored in the variable.
>
>
>
> Hi Henry,
>
>
> We may need some more information; at the moment, the code is obscuring
> some information that exceptions can provide.  Let's enable some more
> diagnostics.  Can you change the except block to something like:
>
> ###
> except:
>     print "Could not obtain data from Web"
>     traceback.print_exc()
> ###
>
> You'll probably need to import the 'traceback' module for this.  The
> additional line, that "traceback.print_exc()", will print out more
> information about the excetion itself, and should give us insight into
> what exactly is causing the magic to fizzle.
>
>
>
>
> > I just tried this same code tonight and once again it works great! I am
> > really puzzled by all this. When one writes a program to access the Web,
> > as long as the site accessed is not "down," one should anticipate always
> > being able to get the data.
>
>
> It actually depends on the service that the web site provides!  For
> example, the National Center for Biotechnology Information (NCBI) provides
> a set of valuable online programs and services for biologists:
>
>     http://www.ncbi.nlm.nih.gov/
>
>
> But, despite the electronic nature of NCBI, there is a kind of scarcity
> involved here: namely, they need to maintain a service that's available to
> scientists in a timely fashion, and some of the services they provide are
> computationally very expensive.  What to do?
>
>
> NCBI has a cap, a kind of rate limiter, that limits how many requests they
> handle from a single computer at a time.  That is, NCBI will block web
> requests of anyone who tries to abuse their public resource.  As an
> example, here's what their guidelines dictate:
>
>
> """
>     Do not overload NCBI's systems. Users intending to send numerous
>     queries and/or retrieve large numbers of records from Entrez should
>     comply with the following:
>
>     * Run retrieval scripts on weekends or between 9 PM and 5 AM ET
>       weekdays for any series of more than 100 requests.
>
>     * Make no more than one request every 3 seconds.
> """
>
>
> And they are serious.  I accidently ran a program once that hammered their
> systems.  It is not a Good Thing when your computer is blacklisted from a
> national public resource.  *cough*
>
>
>
> But that's NCBI; I don't know if the National Weather Service applies a
> similar rate-limiter on their services.  So let's see what the
> traceback.print_exc() gives us in your program above, and we'll work from
> there.
>
>
>
> Good luck!
>
>