[Patches] [ python-Patches-661796 ] BZ2File leaking fd and memory

SourceForge.net noreply@sourceforge.net
Mon, 03 Feb 2003 08:09:41 -0800


Patches item #661796, was opened at 2003-01-03 14:22
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=661796&group_id=5470

Category: Modules
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Gustavo Niemeyer (niemeyer)
Summary: BZ2File leaking fd and memory

Initial Comment:
The attached patch fixes BZ2File() leaking the fd and
PyFileObject* info (name, read ahead buffer) when an
object is deleted.

I'm not sure the patch fixes these problems in the best
way.  It exposes most of the implementation from
fileobject.c::file_dealloc() through a private API
_PyFile_Dealloc().
BZ2File derives from PyFileObject.

Make sure the file: 'empty' exists and do: 

  from bz2 import *

A simple test which demonstrates the problem is:

  for i in range(100000): o = BZ2File('empty') ; del o

Without the patch, you quickly get:

IOError: [Errno 24] Too many open files: 'empty'

You can modify this to:

  for i in range(100000): o = BZ2File('empty') ;
o.close() ; del o

Now you can see the memory leaks.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2003-02-03 11:09

Message:
Logged In: YES 
user_id=6380

Thanks! I expect you'll be surprised at the gained clarity
of the resulting code.


----------------------------------------------------------------------

Comment By: Gustavo Niemeyer (niemeyer)
Date: 2003-02-03 10:52

Message:
Logged In: YES 
user_id=7887

Ok.. I can't find a good way to workaround the presented
reasons. I'll work to unparent BZ2File. 


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2003-02-03 09:51

Message:
Logged In: YES 
user_id=6380

IMO, being a subclass of a concrete class typically only
makes sense if you can maintain the meaning of the instance
variables defined by the base class.  In this case I don't
believe that's the case. If looks like you are using
subclassing from the file class where you should really be
using an instance variable that points to a regular file
object. For example, operations on the file descriptor
returned by fileno() have a very different effect for a
BZ2file than for a regular file (since they manipulate the
compressed file rather than the uncompressed byte stream
represented by the BZ2file).

You may also confuse other parts of the Python runtime that
assume that when they see a file instance they can extract
the underlying stdio FILE* from it and manipulate that.

----------------------------------------------------------------------

Comment By: Gustavo Niemeyer (niemeyer)
Date: 2003-02-03 07:07

Message:
Logged In: YES 
user_id=7887

zlib module doesn't  emulate a file at C level. It also does
not implement as many features as the BZ2File does
(universal newlines, buffered IO, etc).

The strange thing is, I thought it was a good thing to
develop inheriting FileObject that way, since it would save
code and maintenance, while being fast. I also thought it
was a good model to make BZ2File behave as a subclass (I
even mentioned that in the documentation). I'm surprised for
being wrong.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2003-02-02 07:57

Message:
Logged In: YES 
user_id=6380

I don't understand. Why would you want to copy all the
implementation details of those functions? Have a look at
how the zlib module emulates a file.

----------------------------------------------------------------------

Comment By: Gustavo Niemeyer (niemeyer)
Date: 2003-02-01 23:21

Message:
Logged In: YES 
user_id=7887

Ok, I've started to "unparent" the bz2 code. OTOH, I've just
realised what a lot of code, full of #ifdef's, I'll have to
copy, mostly unchanged. The following functions

file_new(), file_init(), open_the_file(),
fill_file_fields(), dircheck(), close_file(),
get_newlines(), _portable_fseek(), file_seek(), file_repr(),

and the following arrays

file_memberlist[], file_getsetlist[]

My question is, do you think it's better to maintain all
this code duplicated than maintaining bz2 up to date?

I'll do the job, if you still think this is great.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2003-01-22 12:29

Message:
Logged In: YES 
user_id=6380

If you could, that would be great.

----------------------------------------------------------------------

Comment By: Gustavo Niemeyer (niemeyer)
Date: 2003-01-22 10:39

Message:
Logged In: YES 
user_id=7887

Ok. Do you want me to work on this for 2.3?

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2003-01-21 14:24

Message:
Logged In: YES 
user_id=6380

FileObject sharing: I see.  But I still think it's a bad
idea, and you're better off using stdio methods directly (as
the evolution of the FileObject implementation has already
shown).

----------------------------------------------------------------------

Comment By: Gustavo Niemeyer (niemeyer)
Date: 2003-01-20 15:44

Message:
Logged In: YES 
user_id=7887

Sorry about the delay. I was on a not-so-short vacation.

nnorwitz:

Thanks for the patch. I'll check the problem (for my own
understanding) and most certainly apply it.

gvanrossum:

BZ2File used to share a lot of code with FileObject.
Unfortunately (for the BZ2File side), FileObject changed
considerably post 2.2, and some of the code had to be moved
to BZ2File itself. It still
uses some code from FileObject, though (open, close, seek,
part of the softspace handling, access to attributes and
some common methods).


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2003-01-05 16:11

Message:
Logged In: YES 
user_id=6380

Um, I don't understand why the BZ2File class inherits from
FileObject. It doesn't seem to be inheriting any code, and
there's no reason to inherit from FileObject just to make a
file-like object. Maybe it was a misunderstanding?

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2003-01-05 15:09

Message:
Logged In: YES 
user_id=33168

I've found a better solution: change the last line of
BZ2File_dealloc() from: 
  ((PyObject*)self)->ob_type->tp_free((PyObject *)self);

to
  PyFile_Type.tp_dealloc((PyObject *)self);
Updated patch attached.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-01-03 14:48

Message:
Logged In: YES 
user_id=21627

Assigning to the author of the module for review. Gustavo,
if you can't find the time for a review, feel free to
unassign it.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=661796&group_id=5470