[issue8390] tarfile: use surrogates for undecode fields

STINNER Victor report at bugs.python.org
Wed Apr 14 01:53:15 CEST 2010


New submission from STINNER Victor <victor.stinner at haypocalc.com>:

When reading a tar archive, tarfile decodes fields using "replace" error handler by default. The result is that we loose informations if there is an undecodable character.

Since the PEP 383, undecodable filenames are stored using surrogates in Python3. I think that it's a good idea to use surrogates for tar, because it's a common problem to have undecodable data in a tar archive (see the unicode section of the tarfile documentation).

----------
components: Library (Lib), Unicode
files: tarfile_surrogates.patch
keywords: patch
messages: 103099
nosy: haypo, loewis
severity: normal
status: open
title: tarfile: use surrogates for undecode fields
versions: Python 3.1, Python 3.2
Added file: http://bugs.python.org/file16917/tarfile_surrogates.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8390>
_______________________________________


More information about the Python-bugs-list mailing list