[Python-bugs-list] [ python-Bugs-563665 ] urllib2 can't cope with error response

noreply@sourceforge.net noreply@sourceforge.net
Mon, 03 Jun 2002 09:17:02 -0700


Bugs item #563665, was opened at 2002-06-02 22:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=563665&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Erik Demaine (edemaine)
>Assigned to: Jeremy Hylton (jhylton)
>Summary: urllib2 can't cope with error response

Initial Comment:
This looks similar to SF bug 216649, but with somewhat
different symptoms.  Redirection seems to cause an
AttributeError (attempt to access self.fp.read when
self.fp is None).  Simple example:

python -c "import urllib2; urllib2.urlopen
('http://www.yahoo.com/promotions/mom_com97/supermom.html')"

Traceback from Python 2.2.1 attached.  Same behavior
appears with Python 2.2.

----------------------------------------------------------------------

>Comment By: Jeremy Hylton (jhylton)
Date: 2002-06-03 16:17

Message:
Logged In: YES 
user_id=31392

I haven't looked at 216649 yet, but this particular
traceback is caused by a problem loading the redirected url.
 If you load 
http://promotions.yahoo.com/promotions/mom_com97/supermom.html,
you'll see the same failure without invoking an redirect
machinery.

My first guess is that the yahoo server is sending an
invalid response and the httplib isn't being generous enough
in skipping the garbage and looking for the valid response
data.  Here's a brief trace of httplib activity:
>>> import httplib
>>> h = httplib.HTTP('promotions.yahoo.com')
>>> h.set_debuglevel(2)
>>> h.putrequest("GET /promotions/mom_com97/supermom.html")   
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: putrequest() takes at least 3 arguments (2 given)
>>> h.putrequest("GET", "/promotions/mom_com97/supermom.html")
connect: (promotions.yahoo.com, 80)
send: 'GET /promotions/mom_com97/supermom.html HTTP/1.0\r\n'
>>> h.endheaders()
send: '\r\n'
>>> h.getreply()
reply: '#\x0f\x01yhh00000011\x010\x01HTTP/1.0 200 OK\n'
(-1, '#\x0f\x01yhh00000011\x010\x01HTTP/1.0 200 OK\n', None)

Not sure what the text starting with a hash is all about.

Of course, urllib2 has a bug that prevents it from reporting
anything useful about this error.  That needs to be fixed.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=563665&group_id=5470