[Tutor] Forbidden HTML, or not?

EJP ejp at zomething.com
Wed Nov 3 18:27:41 CET 2004


> 
> What do I need to do to get the same result with my code, part of which 
> 
> is below:
> 
> import urllib2   # OPENING AND READING HTML
> import re        # SEARCHING THE HTML
> 
> # THIS GIVES "FORBIDDEN" WITH AND WITHOUT THE "?hl=en&gl=us" ATTACHED
> source = urllib2.urlopen('http://news.google.com/nwshp')
> 
> html_text = source.read()


Note in the returned HTML:

"<H1>Forbidden</H1>Your client does not have permission to get URL <code>/</code> from this server."

Google may not allow the "client" or USER_AGENT named by the urllib2 to access the webpage content.  You can try imitating IE, to see if that makes a difference, by declaring the USER_AGENT with the same declaration IE uses.

In a quick try, I got results similar to your using urllib:

IDLE 1.0.3      
>>> url1="http://news.google.com/"
>>> import urllib
>>> page=urllib.urlopen(url1).read()
>>> len(page)
2224
>>> page
'<html><head><title>403 Forbidden</title><style><!--body {font-family: arial,sans-serif}div.nav {margin-top: 1ex}div.nav A {font-size: 10pt; font-family: arial,sans-serif}span.nav {font-size: 10pt; font-family: arial,sans-serif; font-weight: bold}div.nav A,span.big {font-size: 12pt; color: #0000cc}div.nav A {font-size: 10pt; color: black}A.l:link {color: #6f6f6f}A.u:link {color: green}//--></style></head><body text=#000000 bgcolor=#ffffff><table border=0 cellpadding=2 cellspacing=0 width=100%><tr><td rowspan=3 width=1% nowrap><b><font face=times color=#0039b6 size=10>G</font><font face=times color=#c41200 size=10>o</font><font face=times color=#f3c518 size=10>o</font><font face=times color=#0039b6 size=10>g</font><font face=times color=#30a72f size=10>l</font><font face=times color=#c41200 size=10>e</font>&nbsp;&nbsp;</b><td>&nbsp;</td></tr><tr><td bgcolor=#3366cc><font face=arial,sans-serif color=#ffffff><b>Error</b></td></tr><tr><td>&nbsp;</td></tr></table><blockquote><H1>Forbidden</H1>Your client does not have permission to get URL <code>/</code> from this server.<p></blockquote><table width=100% cellpadding=0 cellspacing=0><tr><td bgcolor=#3366cc><img alt="" width=1 height=4></td></tr></table></body></html>'



More information about the Tutor mailing list