[New-bugs-announce] [issue42453] utf-8 codec error when pip uninstalling a package which has files containing unicode filename on Windows

赵豪杰 report at bugs.python.org
Tue Nov 24 11:26:27 EST 2020


New submission from 赵豪杰 <1292756898 at qq.com>:

When using `pip install package_name` installing a package, it will generate a `installed-files.txt` file, which records the file that the package contains. 

When updating or uninstalling the package, pip will need to read the `installed-files.txt` file, then delete the old files. 

If the package installed contains files whose name has unicode character like `文件`, the problem will occur. 

In China (I don't know other places), for historical reasons, the Windows default system codec is `gbk`, so the `installed-files.txt` file is also written with `gbk` codec when installing a package. 

When it comes to updating or uninstalling, the pip will use `utf-8` codec to read the `installed-files.txt` file. Since the file contains non ascii characters, it went error: 

```
  File "d:\users\haujet\appdata\local\programs\python\python39\lib\site-packages\pip\_vendor\pkg_resources\__init__.py", line 1424, in get_metadata
    return value.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb8 in position 343: invalid start byte in installed-files.txt file at path: d:\users\haujet\appdata\local\programs\python\python39\lib\site-packages\Markdown_Toolbox-0.0.8-py3.9.egg-info\installed-files.txt
```

I hate that default `gbk` system codec, but this set is fixed on Windows. 

So, my suggestion is, make a `try except` at the error point, if the `utf-8` codec went wrong reading `installed-files.txt`, then let `gbk` codec have a go. 

Or, more foundamental solution is, when pip writing text files, strictly use `utf-8` codec instead of the default system codec.

----------
components: Windows
messages: 381753
nosy: HaujetZhao, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: utf-8 codec error when pip uninstalling a package which has files containing unicode filename on Windows
type: crash
versions: Python 3.9

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue42453>
_______________________________________


More information about the New-bugs-announce mailing list