[issue17153] tarfile extract fails when Unicode in pathname

Lars Gustäbel report at bugs.python.org
Tue Jul 8 12:40:12 CEST 2014


Lars Gustäbel added the comment:

IIRC, tarfile under 2.7 has never been explicitly unicode-safe, support for unicode objects is heterogeneous at best. The obvious work-around is to work exclusively with str objects.

What we can't do is to decode the utf-8 pathname from the archive to a unicode object, because we have no way to detect an archive's encoding. We can either emit a warning if the user passes a unicode object to extract() or we implicitly encode the passed unicode object using TarFile.encoding, so that the os.path.join() succeeds.

Unfortunately, I am not entirely sure if there was possibly a rationale behind the current behaviour of extract(). This needs more inspection.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue17153>
_______________________________________


More information about the Python-bugs-list mailing list