[Tutor] Accessing a Website

Prasad, Ramit ramit.prasad at jpmorgan.com
Thu Jul 12 21:06:09 CEST 2012


> > My pseudocode is the following
> >
> > new_dictionary = []
> > for name in file:
> >  #1) log into university account
> >  #2) go to website with data
> >  #3) type in search box: name
> >  #4) click search
> >  #5) if name is exact match with name of one of the hits:
> >     line.find("Code Number")
> >     #6) remove the number directly after "Code Number: " and stop at the
> > next space
> > new_dictionary[name] = Code Number
> >
> > With the exception of step 6, I'm not quite sure how to do this in Python.
> > Is it very complicated to write a script that logs onto a website that
> > requires a user name and password that I have, and then repeatedly enters
> > names and gets their associated id's that we want?  I used to work at a
> > cancer lab where we decided we couldn't do this kind of thing to search
> > PubMed, and that a human would be more accurate even though our criteria was
> > simply (is there survival data?).  I don't think that this has to be the
> > case here, but would greatly appreciate any guidance.
> >

> There are a couple of modules (urllib, urllib2) in python and another
> one that I like called requests that let your program access a
> website.  (http://docs.python-requests.org/en/latest/index.html)
> You can read the examples and see if this will help you on your way.
> Also, go to the webpage with the search box, and when you enter a
> search term and submit it, see what the url looks like after
> submitting.  If it is a 'get' request, your search parameter will be
> at the tail of the url.  If it is, you can create those urls in your
> code and request the results (with requests module).
> There is a great module called Beautiful Soup (use version 4) that can
> help you parse through your results
>

You can do that, but it is going to be difficult and I am not even sure
how you would pass things like POST arguments in this manner.

Hopefully this gives you some ideas: 
http://www.akasig.org/2004/12/29/web-scraping-with-python-part-1-crawling/
(more concisely http://www.akasig.org/2004/09/03/web-scraping-with-python/ )


or http://stackoverflow.com/questions/2081586/web-scraping-with-python


You would be better off asking this question on the main python group 
as this group is a much smaller group with the focus of teaching Python.
That group is much larger and more likely to be able help.




Ramit


Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
712 Main Street | Houston, TX 77002
work phone: 713 - 216 - 5423

--


> -----Original Message-----
> From: tutor-bounces+ramit.prasad=jpmorgan.com at python.org [mailto:tutor-
> bounces+ramit.prasad=jpmorgan.com at python.org] On Behalf Of Joel Goldstick
> Sent: Thursday, July 12, 2012 1:43 PM
> To: Fred G
> Cc: tutor at python.org
> Subject: Re: [Tutor] Accessing a Website
> 
> On Thu, Jul 12, 2012 at 2:03 PM, Fred G <bayespokerguy at gmail.com> wrote:
> > Hi--
> >
> > Thanks so much.
> >
> > _______________________________________________
> > Tutor maillist  -  Tutor at python.org
> > To unsubscribe or change subscription options:
> > http://mail.python.org/mailman/listinfo/tutor
> > 
> 
> --
> Joel Goldstick
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  


More information about the Tutor mailing list