Fundamental problem with urllib...

Mon Apr 22 10:03:37 EDT 2002

"Jonathan Hseu" <vomjom at vomjom.org> wrote ...
> Urllib will always have problems with some sites.  Notably, sites with
> cookies that also redirect.
>
> Here's what happens:
> Say, perhaps after POSTing, you want to grab a cookie from the headers.
> The website puts the cookie in a 302 message (redirection for those of
> you who don't know HTTP responses), and then redirects you elsewhere.
> Now, to grab the cookie, you need to be able to get the headers of that
> message before getting redirected.
>
It's true that urllib will not allow the client code to capture cookie
values in the circumstances you describe.

You do not mention it, but I would imagine (without having performed any
testing) that in the event it's redirected to another domain/path
combination a browser would send back a new set of cookie headers,
appropriate to the new URI.

> This is certainly not possible without overriding some functions in
> FancyURLopener or declaring some in URLopener (which creates exceptions
> for all HTTP responses != 200).  This isn't a big deal (I did it myself
> for my program), but this can be simply fixed by adding support for
> cookies within urllib.
>
> Shall I take it upon myself to make such a patch?  Comments/Flames?
>
Since urllib knows nothing of cookies, you will need to integrate some sort
of a cookie jar into the library, with a new API  for the clients to
retrieve and store the cookies.

> I personally think easy cookie handling within urllib is a good thing.

I agree, but you'll probably have to spend some time finding out what the
existing browser community does, and emulating that.

regards
 Steve
--

home: http://www.holdenweb.com/    book: http://pydish.holdenweb.com/pwp/