[issue31557] tarfile: incorrectly treats regular file as directory

Joe Tsai report at bugs.python.org
Fri Sep 22 17:42:02 EDT 2017


New submission from Joe Tsai:

The original V7 header only allocates 100B to store the file path. If a path exceeds this length, then either the PAX format or GNU formats must be used, which can represent arbitrarily long file paths. When doing so, most tar writers just store the first 100B of the file path in the V7 header.

When reading, a proper reader should disregard the contents of the V7 field if a previous and corresponding PAX or GNU header overrode it.

This currently not the case with the tarfile module, which has the following check (https://github.com/python/cpython/blob/c7cc14a825ec156c76329f65bed0d0bd6e03d035/Lib/tarfile.py#L1054-L1057):
    # Old V7 tar format represents a directory as a regular
    # file with a trailing slash.
    if obj.type == AREGTYPE and obj.name.endswith("/"):
        obj.type = DIRTYPE

This check should be further constrained to only activate when there were no prior PAX or GNU records that override that value of obj.name. This check was the source of a bug that caused tarfile to report a regular as a directory because the file path was extra long, and when the tar write truncated the path to the first 100B, it so happened to end on a slash.

----------
messages: 302778
nosy: Joe Tsai
priority: normal
severity: normal
status: open
title: tarfile: incorrectly treats regular file as directory

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue31557>
_______________________________________


More information about the Python-bugs-list mailing list