Finding Default Page Name using urllib2

barrett stephen.p.barrett at gmail.com
Mon Oct 27 12:17:04 EDT 2008


Is there a way to find the name of a page you are retrieving using
python.  For example, if I get http://www.cnn.com/ i want to know that
the page is index.html.  I can do this using wget. as seen in the code
below.  Can I do this in python?

Thanks,

$ wget cnn.com
--11:15:25--  http://cnn.com/
           => `index.html'
Resolving cnn.com... 157.166.226.25, 157.166.226.26,
157.166.224.25, ...
Connecting to cnn.com|157.166.226.25|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.cnn.com/ [following]
--11:15:25--  http://www.cnn.com/
           => `index.html'
Resolving www.cnn.com... 157.166.224.25, 157.166.224.26,
157.166.226.25, ...
Reusing existing connection to cnn.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 96,094 (94K) [text/html]

100%[====================================>] 96,094        68.15K/s

11:15:28 (67.99 KB/s) - `index.html' saved [96094/96094]



More information about the Python-list mailing list