urllib.urlretireve problem

Wade wade at leftwich.us
Thu Mar 31 09:54:45 EST 2005


Diez B. Roggisch wrote:
> It makes no sense having urllib generating exceptions for such a
case. From
> its point of view, things work pefectly - it got a result. No network
error
> or whatsoever.
>
> Its your application that is not happy with the result - but it has
to
> figure that out by itself.
>
> You could for instance try and see what kind of result you got using
the
> unix file command - it will tell you that you received a html file,
not a
> deb.
>
> Or check the mimetype returned - its text/html in the error case of
yours,
> and most probably something like application/octet-stream otherwise.
>
> Regards,
>
> Diez

Also be aware that many webservers (especially IIS ones) are configured
to return some kind of custom page instead of a stock 404, and you
might be getting a 200 status code even though the page you requested
is not there. So depending on what site you are scraping, you might
have to read the page you got back to figure out if it's what you
wanted.

-- Wade Leftwich
Ithaca, NY




More information about the Python-list mailing list