fetching webpage

charlespina at gmail.com charlespina at gmail.com
Thu Dec 29 21:37:48 EST 2005


I went to the URL you posted, and it looks like that error is the
content you should be recieving. Try refreshing your browser cache, you
could be loading a cached page.

Charles

yookyung wrote:
> I am trying to crawl webpages in citeseer domain (a collection of research
> papers mostly in computer science).
>
> I have used the following code snippet.
>
> #####
> import urllib
>
> sock = urllib.urlopen("http://citeseer.ist.psu.edu")
> webcontent = sock.read().split('\n')
> sock.close()
> print webcontent
> ########
>
> Then I get the following error message.
>
>
> ['<!--#set var="TITLE" value="Server error!"', '--><!--#include
> virtual="include/top.html" -->', '', '  <!--#if
> expr="$REDIRECT_ERROR_NOTES" -->', '', '    The server encountered an
> internal error and was ', '    unable to complete your request.', '', '
> <!--#include virtual="include/spacer.html" -->', '', '    Error message:', '
> <br /><!--#echo encoding="none" var="REDIRECT_ERROR_NOTES" -->', '', '
> <!--#else -->', '', '    The server encountered an internal error and was ',
> '    unable to complete your request. Either the server is', '    overloaded
> or there was an error in a CGI script.', '', '  <!--#endif -->', '',
> '<!--#include virtual="include/bottom.html" -->', '']
>
> However, the url is valid and it works fine if I open the url in my web
> browser.
> Or, if I use a different url (http://www.google.com  instead of
> http://citeseer.ist.psu.edu),
> then it works.
>
> What is wrong?
> Could it be that the citeseer webserver checks the http request, and it sees
> something
> that it doesn't like and reject the request?
> What should I do?
> 
> Thank you.
> 
> Best regards,
> Yookyung




More information about the Python-list mailing list