[Python-bugs-list] [ python-Bugs-626543 ] urllib2 doesn't do HTTP-EQUIV & Refresh
noreply@sourceforge.net
noreply@sourceforge.net
Wed, 23 Oct 2002 06:54:43 -0700
Bugs item #626543, was opened at 2002-10-21 22:57
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=626543&group_id=5470
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: John J Lee (jjlee)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib2 doesn't do HTTP-EQUIV & Refresh
Initial Comment:
I just added support for HTML's META HTTP-EQUIV and
zero-time Refresh HTTP headers to my 'ClientCookie'
package (which exports essentially a clone of the
urllib2 interface that knows about cookies, making use
of urllib2 in the implementation). I didn't make a
patch for urllib2 itself but it would be easy to do so.
I don't plan to do this immediately, but will
eventually (assuming Jeremy thinks it's advisible) -- I
just wanted to register this fact to prevent
duplication of effort.
[BTW, this version of ClientCookie isn't on my web page
yet -- my motherboard just died.]
I'm sure you know this already, but: HTTP-EQUIV is just
a way of putting headers in the HEAD section of an HTML
document; Refresh is a Netscape 1.1 header that
indicates that a browser should redirect after a
specified time. Refresh headers with zero time act
like redirections.
The net result of the code I just wrote is that if you
urlopen a URL that points to an HTML document like
this:
<HTML><HEAD>
<META HTTP-EQUIV="Refresh" CONTENT="0;
URL=http://acme.com/new_url.htm">
</HEAD></HTML>
you're automatically redirected to
"http://acme.com/new_url.htm". Same thing happens if
the Refresh is in the HTTP headers, because all the
HTTP-EQUIV headers are treated like real HTTP headers.
Refresh with non-zero delay time is ignored (the
urlopen returns the document body unchanged and does
not redirect, but does still add the Refresh header to
the HTTP headers).
A few issues:
0) AFAIK, the Refresh header is not specified in any
RFC, but only here:
http://wp.netscape.com/assist/net_sites/pushpull.html
(HTTP-EQUIV seems to be in the HTML 4.0 standard, maybe
earlier ones too)
1) Infinite loops should be detected, as for HTTP 30x?
Presumably yes.
2) Should add HTTP-EQUIV headers to response object, or
just treat them like headers internally? Perhaps it
should be possible to get both behaviours?
3) Bug in my implementation: is greedy with reading
body data from httplib's file object.
John
----------------------------------------------------------------------
>Comment By: Martin v. Löwis (loewis)
Date: 2002-10-23 15:54
Message:
Logged In: YES
user_id=21627
In addition to the issues you have mentioned, there is also
the backwards compatibility issue: Some applications might
expect to get a meta-refresh document from urllib, then parse
it and retry themselves. Those applications would break with
such a change.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=626543&group_id=5470