is this possible (getting info off web page)

bruce bedouglas at earthlink.net
Thu Apr 9 17:57:09 EDT 2009


Hi Esmail.

I've not looked at the site. however, i can give you some general pointers
that might help you in solving your issue.

first off, try to accomplish your goal, using curl, or one of the other
cmdline apps that fetch page data. this allows you to quickly nail down any
issues that might occur, and to see if it's even possible to fetch the page.

but even before you attempt curl, you might need to analyze the page/process
of fetching the page. if you try a simple fetch using curl, and you don't
get your page, then it might be a matter of using cookies and/or setting the
user-agent, or any number of things.

page/traffic analysis will allow you to see what traffic is happening
between the server and your browser. if you're using firefox, you can get a
plugin (livehttpheaders) which will allow you to see what's going on. this
will tell you if coookies are being used/set, if you're using a straight
get/post process, whether you're doing redirects, etc...

once you have a better understanding of the process required, you can then
setup curl to fetch the page. if you get the page with cur, you can then
proceed to using python/urllib2/mech/cookies/etc.. to then fetch the page..
and once you get the page, you can then play with XPath, to
manipulate/drive/parse the DOM to get the information you want...

Have fun!!!



-----Original Message-----
From: python-list-bounces+bedouglas=earthlink.net at python.org
[mailto:python-list-bounces+bedouglas=earthlink.net at python.org]On Behalf
Of Esmail
Sent: Thursday, April 09, 2009 2:47 PM
To: python-list at python.org
Subject: is this possible (getting info off web page)


Hi,

I am trying to keep track of two flight bookings on the kayak.com
web site, and would like to automate my query using Python. If I
enter the url below myself into the browser,  a form gets filled
out and the site searches for flights. I would love to be able to
have a simple python script which I can supply with a url and
have it simply display, in text if possible, the best 3 prices.

I thought great opportunity to learn more about Python, and then
presumably re as I parse the text....

However, when I use the code below (located via google), I end up
with an alternate page, basically telling me that kayak doesn't like
bots. So, it seems like Kayak has it set up to prevent automated
access?

Is there anyway to do what I would like to do?

Thanks,

Esmail

--

import urllib2

url =
'http://www.kayak.com/s/search/air?l1=cmh&d1=4/23/2009&l2=yyz&d2=4/26/2009'
res = urllib2.urlopen(url)
page = res.read()

--
http://mail.python.org/mailman/listinfo/python-list




More information about the Python-list mailing list