A problem while using urllib
Johnny Lee
johnnyandfiona at hotmail.com
Wed Oct 12 06:53:46 EDT 2005
Steve Holden wrote:
> Steve Holden wrote:
> > Johnny Lee wrote:
> > [...]
> >
> >>I've sent the source, thanks for your help.
> >>
> >
> > [...]
> > Preliminary result, in case this rings bells with people who use urllib2
> > quite a lot. I modified the error case to report the actual message
> > returned with the exception and I'm seeing things like:
> >
> > http://www.holdenweb.com/./Python/webframeworks.html
> > Message: <urlopen error (120, 'Operation already in progress')>
> > Start process
> > http://www.amazon.com/exec/obidos/ASIN/0596001886/steveholden-20
> > Error: IOError while parsing
> > http://www.amazon.com/exec/obidos/ASIN/0596001886/steveholden-20
> > Message: <urlopen error (120, 'Operation already in progress')>
> > .
> > .
> > .
> >
> > So at least we know now what the error is, and it looks like some sort
> > of resource limit (though why only on Cygwin betas me) ... anyone,
> > before I start some serious debugging?
> >
> I realized after this post that WingIDE doesn't run under Cygwin, so I
> modified the code further to raise an error and give us a proper
> traceback. I also tested the program under the standard Windows 2.4.1
> release, where it didn't fail, so I conclude you have unearthed a Cygwin
> socket bug. Here's the traceback:
>
> End process http://www.holdenweb.com/contact.html
> Start process http://freshmeat.net/releases/192449
> Error: IOError while parsing http://freshmeat.net/releases/192449
> Message: <urlopen error (120, 'Operation already in progress')>
> Traceback (most recent call last):
> File "Spider_bug.py", line 225, in ?
> spider.run()
> File "Spider_bug.py", line 143, in run
> self.grabUrl(tempUrl)
> File "Spider_bug.py", line 166, in grabUrl
> webPage = urllib2.urlopen(url).read()
> File "/usr/lib/python2.4/urllib2.py", line 130, in urlopen
> return _opener.open(url, data)
> File "/usr/lib/python2.4/urllib2.py", line 358, in open
> response = self._open(req, data)
> File "/usr/lib/python2.4/urllib2.py", line 376, in _open
> '_open', req)
> File "/usr/lib/python2.4/urllib2.py", line 337, in _call_chain
> result = func(*args)
> File "/usr/lib/python2.4/urllib2.py", line 1021, in http_open
> return self.do_open(httplib.HTTPConnection, req)
> File "/usr/lib/python2.4/urllib2.py", line 996, in do_open
> raise URLError(err)
> urllib2.URLError: <urlopen error (120, 'Operation already in progress')>
>
> Looking at that part of the course of urrllib2 we see:
>
> headers["Connection"] = "close"
> try:
> h.request(req.get_method(), req.get_selector(), req.data,
> headers)
> r = h.getresponse()
> except socket.error, err: # XXX what error?
> raise URLError(err)
>
> So my conclusion is that there's something in the Cygwin socket module
> that causes problems not seen under other platforms.
>
> I couldn't find any obviously-related error in the Python bug tracker,
> and I have copied this message to the Cygwin list in case someone there
> knows what the problem is.
>
> Before making any kind of bug submission you should really see if you
> can build a program shorter that the existing 220+ lines to demonstrate
> the bug, but it does look to me like your program should work (as indeed
> it does on other platforms).
>
> regards
> Steve
> --
> Steve Holden +44 150 684 7255 +1 800 494 3119
> Holden Web LLC www.holdenweb.com
> PyCon TX 2006 www.python.org/pycon/
But if you change urllib2 to urllib, it works under cygwin. Are they
using different mechanism to connect to the page?
More information about the Python-list
mailing list