[Python-Dev] Improved tmpfile module

Zack Weinberg zack@codesourcery.com
Thu, 27 Jun 2002 15:12:28 -0700


I'm not subscribed to python-dev.  Please cc: me directly on replies.
I'm going to respond to all the comments at once.

Greg Ward wrote:
> > Attached please find a rewritten and improved tmpfile.py.  The major
> > change is to make the temporary file names significantly harder to
> > predict.  This foils denial-of-service attacks, where a hostile
> > program floods /tmp with files named @12345.NNNN to prevent process
> > 12345 from creating any temp files.  It also makes the race condition
> > inherent in tmpfile.mktemp() somewhat harder to exploit.
> 
> Oh, good!  I've long wished that there was a tmpfile module written by
> someone who understands the security issues involved in generating
> temporary filenames and files.  I hope you do... ;-)

Well, I wrote the analogous code in the GNU C library (using basically
the same algorithm).  I'm confident it is safe on a Unix-based system.
On Windows and others, I am relying on os.open(..., os.O_EXCL) to do
what it claims to do; assuming it does, the code should be safe there too.

> > (fd, name) = mkstemp(suffix="", binary=1): Creates a temporary file,
> > returning both an OS-level file descriptor open on it and its name.
> > This is useful in situations where you need to know the name of the
> > temporary file, but can't risk the race in mktemp.
> 
> +1 except for the name.  What does the "s" stand for?  Unfortunately, I
> can't think of a more descriptive name offhand.

Fredrik Lundh's suggestion that it is for "safer" seems plausible, but
I do not actually know.  I chose the names mkstemp and mkdtemp to
match the functions of the same name in most modern Unix C libraries.
Since they don't take the same "template" parameter that those
functions do, that was probably a bad idea.

[Note to Fredrik: at the C level, mkstemp is not deprecated in favor
of tmpfile, as they do very different things - tmpfile(3) is analogous
to tmpfile.TemporaryFile(), you don't get the file name back.]

I'm open to suggestions for a better routine name; I can't think of a
good one myself.

> > name = mkdtemp(suffix=""): Creates a temporary directory, without
> > race.
> 
> How about calling this one mktempdir() ?

Sure.

> I've scanned your code and the existing tempfile.py.  I don't
> understand why you rearranged things.  Please explain why your
> arrangement of _TemporaryFileWrapper/TemporaryFile/
> NamedTemporaryFile is better than what we have.

I was trying to get all the user-accessible interfaces to be at the
top of the file.  Also, I do not understand the bits in the existing
file that delete names out of the module namespace after we're done
with them, so I wound up taking all of that out to get it to work.  I
think the existing file's organization was largely determined by those
'del' statements.

I'm happy to organize the file any way y'all like -- I'm kind of new
to Python and I don't know the conventions yet.


> A few minor comments on the code...
> 
> > if os.name == 'nt':
> >     _template = '~%s~'
> > elif os.name in ('mac', 'riscos'):
> >     _template = 'Python-Tmp-%s'
> > else:
> >     _template = 'pyt%s' # better ideas?
> 
> Why reveal the implementation language of the application creating these
> temporary names?  More importantly, why do it certain platforms, but not
> others?

This is largely as it was in the old file.  I happen to know that ~%s~
is conventional for temporary files on Windows.  I changed 'tmp%s' to
'pyt%s' for Unix to make it consistent with Mac/RiscOS

Ideally one would allow the calling application to control the prefix, but
I'm not sure what the right interface is.  Maybe

 tmpfile.mkstemp(prefix="", suffix="")

where if one argument is provided it gets treated as the suffix, but
if two are provided the prefix comes first, a la range()?  Is there a
way to express that in the prototype?


> > ### Recommended, user-visible interfaces.
> > 
> > _text_openflags = os.O_RDWR | os.O_CREAT | os.O_EXCL
> > if os.name == 'posix':
> >     _bin_openflags = os.O_RDWR | os.O_CREAT | os.O_EXCL
> 
> Why not just "_bin_openflags = _text_openflags" ?  That clarifies their
> equality on Unix.
> 
> > else:
> >     _bin_openflags = os.O_RDWR | os.O_CREAT | os.O_EXCL | os.O_BINARY
> 
> Why not "_bin_openflags = _text_openflags | os.O_BINARY" ?

*shrug* Okay.


> 
> > def mkstemp(suffix="", binary=1):
> >     """Function to create a named temporary file, with 'suffix' for
> >     its suffix.  Returns an OS-level handle to the file and the name,
> >     as a tuple.  If 'binary' is 1, the file is opened in binary mode,
> >     otherwise text mode (if this is a meaningful concept for the
> >     operating system in use).  In any case, the file is readable and
> >     writable only by the creating user, and executable by no one."""
> 
> "Function to" is redundant.

I didn't change much of this text from the old file.  Where are
docstring conventions documented?

>     """Create a named temporary file.
> 
>     Create a named temporary file with 'suffix' for its suffix.  Return
>     a tuple (fd, name) where 'fd' is an OS-level handle to the file, and
>     'name' is the complete path to the file.  If 'binary' is true, the
>     file is opened in binary mode, otherwise text mode (if this is a
>     meaningful concept for the operating system in use).  In any case,
>     the file is readable and writable only by the creating user, and
>     executable by no one (on platforms where that makes sense).
>     """

Okay.

> Hmmm: if suffix == ".bat", the file is executable on some platforms.
> That last sentence still needs work.

   ... In any case, the file is readable and writable only by the
   creating user.  On platforms where the file's permission bits
   control whether it can be executed as a program, no one can.  Other
   platforms have other ways of controlling this: for instance, under
   Windows, the suffix determines whether the file can be executed.

How's that?

> > class _TemporaryFileWrapper:
> >     """Temporary file wrapper
> > 
> >     This class provides a wrapper around files opened for temporary use.
> >     In particular, it seeks to automatically remove the file when it is
> >     no longer needed.
> >     """
> 
> Here's where I started getting confused.  I don't dispute that the
> existing code could stand some rearrangement, but I don't understand why
> you did it the way you did.  Please clarify!

See above.  What would you consider a sensible arrangement?

> 
> > ### Deprecated, user-visible interfaces.
> > 
> > def mktemp(suffix=""):
> >     """User-callable function to return a unique temporary file name."""
> >     while 1:
> >         name = _candidate_name(suffix)
> >         if not os.path.exists(name):
> >             return name
> 
> The docstring for mktemp() should state *why* it's bad to use this
> function -- otherwise people will say, "oh, this looks like it does what
> I need" and use it in ignorance.  So should the library reference
> manual.

Good point.

   """Suggest a name to be used for a temporary file.

   This function returns a file name, with 'suffix' for its suffix,
   which did not correspond to any file at some point in the past.  By
   the time you get the return value of this function, a file may have
   already been created with that name.  It is therefore unsafe to use
   this function for any purpose.  It is deprecated and may be removed
   in a future version of Python."""

and corresponding text in the library manual?

Tim Peters wrote:
> 
> -1 on the implementation here, because it didn't start with current CVS, so
> is missing important work that went into improving this module on Windows
> for 2.3.  Whether spawned/forked processes inherit descriptors for "temp
> files" is also a security issue that's addressed in current CVS but seemed
> to have gotten dropped on the floor here.

I'll get my hands on a copy of current CVS and rework my changes
against that.

> A note on UI:  for many programmers, "it's a feature" that temp file names
> contain the pid.  I don't think we can get away with taking that away no
> matter how stridently someone claims it's bad for us <wink>.

GNU libc took that away from C programmers about four years ago and no
one even noticed.  FreeBSD libc, ditto, although I'm not sure when it
happened.

zw