[issue24838] tarfile.py: fix GNU and USTAR formats to properly handle paths with special characters that are encoded with more than one byte each

Roddy Shuler report at bugs.python.org
Mon Aug 10 20:04:23 CEST 2015


New submission from Roddy Shuler:

GNU and USTAR formats use a special case if the file path is longer than 100 bytes. The detection for this, though, incorrectly checked for 100 characters rather than 100 bytes. So, if the length was close to but not exceeding 100 characters and included special characters such that the encoded length is greater than 100 bytes, the encoded string was truncated to 100 bytes and thus the resulting file name was truncated within the tar file.

For example...

/gt-education/Colección Educativa Guatemala/thumbs/Libro de Texto Comunicacion y Lenguaje 1 Grado.jpg

is truncated as:

/gt-education/Colección Educativa Guatemala/thumbs/Libro de Texto Comunicacion y Lenguaje 1 Grado.jp

The attached patch fixes this.  Initially found on Python 3.3.  Patch is tested on Linux with version 3.4.3-6 from Debian.  Looking at the source code, I am pretty confident that the problem still exists upstream in Python 3.5.

----------
files: fix-tarfile-path-truncation.patch
keywords: patch
messages: 248363
nosy: Roddy Shuler
priority: normal
severity: normal
status: open
title: tarfile.py: fix GNU and USTAR formats to properly handle paths with special characters that are encoded with more than one byte each
type: behavior
versions: Python 3.5
Added file: http://bugs.python.org/file40157/fix-tarfile-path-truncation.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue24838>
_______________________________________


More information about the Python-bugs-list mailing list