[issue9561] distutils: set encoding to utf-8 for input and output files

Toshio Kuratomi report at bugs.python.org
Mon Sep 13 17:37:07 CEST 2010


Toshio Kuratomi <a.badger at gmail.com> added the comment:

>>> - RPM spec files, which use ASCII or UTF-8 according to
>>> http://en.opensuse.org/openSUSE:Specfile_guidelines#Specfile_Encoding but
>>> it’s not confirmed in
>>> http://www.rpm.org/max-rpm/s1-rpm-build-creating-spec-file.html (linked
>>> from the LSB site)
>> UTF-8 is a superset of ASCII. If you use utf-8 but only write ascii
>> characters, your output file will be written to utf-8... but it will be also
>> encoded to ascii. It's magical :-)
>
> I know that, but it does not answer the question:  Is it okay for these files
> to use UTF-8?

rpm spec files are encoding agnostic similar to POSIX filesystems.  This causes no end of troubles for people writing python code that deals with python of course, as they cannot rely on the bytes that they are dealing with from one package to another to have the same encoding (Remember that things like dependency solvers have to compare the information from multiple packages to make their decisions).

Individual distributions will have different policies about encoding and the use of unicode in spec files to try and mitigate the problems.  For instance, Fedora specifies utf-8 in the spec files and additionally specifies that package names must be ascii.  (So if there's a package name: python-café, we would likely transcribe it as python-cafe when we made a package for it).

utf-8 is a good default for locales on POSIX systems so it's a good default for encoding spec files but I know there's some people out there who make their own packages that aren't utf-8.  I haven't checked but I also wouldn't be surprised if some Asian countries (where the bytes-per-character with utf-8 is high) have local distributions that use non-utf-8 encoding as well.  Whether either of these use cases needs to be catered to in distutils (when the support is going away in distutils2) I'll leave to someone else to decide.  My personal gut instinct is no but I'm not one of the people using a non-utf-8 locale.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9561>
_______________________________________


More information about the Python-bugs-list mailing list