[Python-Dev] tarfile and directory traversal vulnerability

Lars Gustäbel lars at gustaebel.de
Mon Aug 27 21:59:50 CEST 2007


On Mon, Aug 27, 2007 at 07:40:36PM +0200, Jan Matejek wrote:
> Lars Gustäbel wrote:
> > Suppose we have:
> > foo -> /etc
> > foo/passwd
> > 
> > If creation of the foo symlink is delayed, foo/passwd will be
> > extracted in a directory foo which will be created implicitly.
> > If we create the foo symlink afterwards it will fail because foo
> > already exists. The best way would be to completely ignore
> > members and link targets that are absolute or outside the
> > archive's scope.
> 
> GNU tar doesn't descend into symlinked directories when extracting, such
> archive fails anyway:
> 
> # tar xvf foo.tar
> foo
> foo/passwd
> tar: foo/passwd: Cannot open: Not a directory
> tar: Error exit delayed from previous errors
> 
> I think that is the simplest solution, but i'm not sure how to best
> implement that in extractall().

GNU tar creates a placeholder file for every hard or symbolic
link during the extract process and in a second step replaces
them with links.
I don't think that this is a good choice for a library. The
problem is that it leads to delayed and (from the user's POV)
unrelated errors. I prefer the solution that archive members
with pathnames that either start with a "/" or a "../" raise an
exception by default and can be extracted only by direct
request.

I am currently working on a patch. Should we move this
discussion over to the bugtracker?

-- 
Lars Gustäbel
lars at gustaebel.de

Linux is like a wigwam - no Gates, no Windows, Apache inside.


More information about the Python-Dev mailing list