A problem while using urllib

Johnny Lee johnnyandfiona at hotmail.com
Wed Oct 12 06:53:46 EDT 2005


Steve Holden wrote:
> Steve Holden wrote:
> > Johnny Lee wrote:
> > [...]
> >
> >>I've sent the source, thanks for your help.
> >>
> >
> > [...]
> > Preliminary result, in case this rings bells with people who use urllib2
> > quite a lot. I modified the error case to report the actual message
> > returned with the exception and I'm seeing things like:
> >
> > http://www.holdenweb.com/./Python/webframeworks.html
> >     Message: <urlopen error (120, 'Operation already in progress')>
> > Start process
> > http://www.amazon.com/exec/obidos/ASIN/0596001886/steveholden-20
> > Error: IOError while parsing
> > http://www.amazon.com/exec/obidos/ASIN/0596001886/steveholden-20
> >     Message: <urlopen error (120, 'Operation already in progress')>
> >     .
> >     .
> >     .
> >
> > So at least we know now what the error is, and it looks like some sort
> > of resource limit (though why only on Cygwin betas me) ... anyone,
> > before I start some serious debugging?
> >
> I realized after this post that WingIDE doesn't run under Cygwin, so I
> modified the code further to raise an error and give us a proper
> traceback. I also tested the program under the standard Windows 2.4.1
> release, where it didn't fail, so I conclude you have unearthed a Cygwin
> socket bug. Here's the traceback:
>
> End process http://www.holdenweb.com/contact.html
> Start process http://freshmeat.net/releases/192449
> Error: IOError while parsing http://freshmeat.net/releases/192449
>     Message: <urlopen error (120, 'Operation already in progress')>
> Traceback (most recent call last):
>    File "Spider_bug.py", line 225, in ?
>      spider.run()
>    File "Spider_bug.py", line 143, in run
>      self.grabUrl(tempUrl)
>    File "Spider_bug.py", line 166, in grabUrl
>      webPage = urllib2.urlopen(url).read()
>    File "/usr/lib/python2.4/urllib2.py", line 130, in urlopen
>      return _opener.open(url, data)
>    File "/usr/lib/python2.4/urllib2.py", line 358, in open
>      response = self._open(req, data)
>    File "/usr/lib/python2.4/urllib2.py", line 376, in _open
>      '_open', req)
>    File "/usr/lib/python2.4/urllib2.py", line 337, in _call_chain
>      result = func(*args)
>    File "/usr/lib/python2.4/urllib2.py", line 1021, in http_open
>      return self.do_open(httplib.HTTPConnection, req)
>    File "/usr/lib/python2.4/urllib2.py", line 996, in do_open
>      raise URLError(err)
> urllib2.URLError: <urlopen error (120, 'Operation already in progress')>
>
> Looking at that part of the course of urrllib2 we see:
>
>          headers["Connection"] = "close"
>          try:
>              h.request(req.get_method(), req.get_selector(), req.data,
> headers)
>              r = h.getresponse()
>          except socket.error, err: # XXX what error?
>              raise URLError(err)
>
> So my conclusion is that there's something in the Cygwin socket module
> that causes problems not seen under other platforms.
>
> I couldn't find any obviously-related error in the Python bug tracker,
> and I have copied this message to the Cygwin list in case someone there
> knows what the problem is.
>
> Before making any kind of bug submission you should really see if you
> can build a program shorter that the existing 220+ lines to demonstrate
> the bug, but it does look to me like your program should work (as indeed
> it does on other platforms).
>
> regards
>   Steve
> --
> Steve Holden       +44 150 684 7255  +1 800 494 3119
> Holden Web LLC                     www.holdenweb.com
> PyCon TX 2006                  www.python.org/pycon/

But if you change urllib2 to urllib, it works under cygwin. Are they
using different mechanism to connect to the page?




More information about the Python-list mailing list