[Python-bugs-list] [ python-Bugs-626543 ] urllib2 doesn't do HTTP-EQUIV & Refresh

noreply@sourceforge.net noreply@sourceforge.net
Wed, 23 Oct 2002 06:54:43 -0700


Bugs item #626543, was opened at 2002-10-21 22:57
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=626543&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: John J Lee (jjlee)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib2 doesn't do HTTP-EQUIV & Refresh

Initial Comment:
I just added support for HTML's META HTTP-EQUIV and
zero-time Refresh HTTP headers to my 'ClientCookie'
package (which exports essentially a clone of the
urllib2 interface that knows about cookies, making use
of urllib2 in the implementation).  I didn't make a
patch for urllib2 itself but it would be easy to do so.
I don't plan to do this immediately, but will
eventually (assuming Jeremy thinks it's advisible) -- I
just wanted to register this fact to prevent
duplication of effort.

[BTW, this version of ClientCookie isn't on my web page
yet -- my motherboard just died.]

I'm sure you know this already, but: HTTP-EQUIV is just
a way of putting headers in the HEAD section of an HTML
document; Refresh is a Netscape 1.1 header that
indicates that a browser should redirect after a
specified time.  Refresh headers with zero time act
like redirections.

The net result of the code I just wrote is that if you
urlopen a URL that points to an HTML document like
this:

<HTML><HEAD>
<META HTTP-EQUIV="Refresh" CONTENT="0; 
URL=http://acme.com/new_url.htm">
</HEAD></HTML>

you're automatically redirected to
"http://acme.com/new_url.htm".  Same thing happens if
the Refresh is in the HTTP headers, because all the
HTTP-EQUIV headers are treated like real HTTP headers.
Refresh with non-zero delay time is ignored (the
urlopen returns the document body unchanged and does
not redirect, but does still add the Refresh header to
the HTTP headers).

A few issues:

0) AFAIK, the Refresh header is not specified in any
RFC, but only here:

http://wp.netscape.com/assist/net_sites/pushpull.html

(HTTP-EQUIV seems to be in the HTML 4.0 standard, maybe
earlier ones too)

1) Infinite loops should be detected, as for HTTP 30x?
   Presumably yes.

2) Should add HTTP-EQUIV headers to response object, or
   just treat them like headers internally?  Perhaps it
   should be possible to get both behaviours?

3) Bug in my implementation: is greedy with reading
   body data from httplib's file object.


John


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-10-23 15:54

Message:
Logged In: YES 
user_id=21627

In addition to the issues you have mentioned, there is also 
the backwards compatibility issue: Some applications might 
expect to get a meta-refresh document from urllib, then parse 
it and retry themselves. Those applications would break with 
such a change.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=626543&group_id=5470