[ python-Bugs-1208304 ] urllib2's urlopen() method causes a memory leak
SourceForge.net
noreply at sourceforge.net
Wed Jun 29 05:52:17 CEST 2005
Bugs item #1208304, was opened at 2005-05-25 09:20
Message generated for change (Comment added) made by jafo
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1208304&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Extension Modules
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Petr Toman (manekcz)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib2's urlopen() method causes a memory leak
Initial Comment:
It seems that the urlopen(url) methd of the urllib2 module
leaves some undestroyable objects in memory.
Please try the following code:
==========================
if __name__ == '__main__':
import urllib2
a = urllib2.urlopen('http://www.google.com')
del a # or a = None or del(a)
# check memory on memory leaks
import gc
gc.set_debug(gc.DEBUG_SAVEALL)
gc.collect()
for it in gc.garbage:
print it
==========================
In our code, we're using lots of urlopens in a loop and
the number of unreachable objects grows beyond all
limits :) We also tried a.close() but it didn't help.
You can also try the following:
==========================
def print_unreachable_len():
# check memory on memory leaks
import gc
gc.set_debug(gc.DEBUG_SAVEALL)
gc.collect()
unreachableL = []
for it in gc.garbage:
unreachableL.append(it)
return len(str(unreachableL))
if __name__ == '__main__':
print "at the beginning", print_unreachable_len()
import urllib2
print "after import of urllib2", print_unreachable_len()
a = urllib2.urlopen('http://www.google.com')
print 'after urllib2.urlopen', print_unreachable_len()
del a
print 'after del', print_unreachable_len()
==========================
We're using WindowsXP with latest patches, Python 2.4
(ActivePython 2.4 Build 243 (ActiveState Corp.) based on
Python 2.4 (#60, Nov 30 2004, 09:34:21) [MSC v.1310
32 bit (Intel)] on win32).
----------------------------------------------------------------------
>Comment By: Sean Reifschneider (jafo)
Date: 2005-06-29 03:52
Message:
Logged In: YES
user_id=81797
I give up, this code is kind of a maze of twisty little
passages. I did try doing "a.fp.close()" and that didn't
seem to help at all. Couldn't really make any progress on
that though. I also tried doing a "if a.headers.fp:
a.headers.fp.close()", which didn't do anything noticable.
----------------------------------------------------------------------
Comment By: Sean Reifschneider (jafo)
Date: 2005-06-29 03:27
Message:
Logged In: YES
user_id=81797
I can reproduce this in both the python.org 2.4 RPM and in a
freshly built copy from CVS. If I run a few thousand
urlopen()s, I get:
Traceback (most recent call last):
File "/tmp/mt", line 26, in ?
File "/tmp/python/dist/src/Lib/urllib2.py", line 130, in
urlopen
File "/tmp/python/dist/src/Lib/urllib2.py", line 361, in open
File "/tmp/python/dist/src/Lib/urllib2.py", line 379, in _open
File "/tmp/python/dist/src/Lib/urllib2.py", line 340, in
_call_chain
File "/tmp/python/dist/src/Lib/urllib2.py", line 1026, in
http_open
File "/tmp/python/dist/src/Lib/urllib2.py", line 1001, in
do_open
urllib2.URLError: <urlopen error (24, 'Too many open files')>
Even if I do a a.close(). I'll investigate a bit further.
Sean
----------------------------------------------------------------------
Comment By: A.M. Kuchling (akuchling)
Date: 2005-06-01 23:13
Message:
Logged In: YES
user_id=11375
Confirmed. The objects involved seem to be an HTTPResponse and the
socket._fileobject wrapper; the assignment 'r.recv=r.read' around line 1013
of urllib2.py seems to be critical to creating the cycle.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1208304&group_id=5470
More information about the Python-bugs-list
mailing list