[Python-ideas] Force UTF-8 option regardless locale

INADA Naoki songofacandy at gmail.com
Tue Aug 30 19:34:22 EDT 2016


On Wed, Aug 31, 2016 at 4:45 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 30.08.2016 10:29, Victor Stinner wrote:
>> Le 30 août 2016 02:05, "INADA Naoki" <songofacandy at gmail.com> a écrit :
>>> How should the option be set?
>>
>> I propose to add a new -X utf8 option. Maybe if the use case is important,
>> we might add an PYTHONUTF8 environment variable.
>>
>> The problem is that I'm not sure that an env var is the right way to
>> configure Python on such environment? But an env var shouldn't hurt and it
>> is common to add a new env var with a new cmdline option.
>>
>> I added PYTHONFAULTHANDLER=1/-X faulthandler for faulthandler and
>> PYTHONTRACEMALLOC=N/-X tracemalloc=N for tracemalloc.
>
> In PyRun we simply define a default for PYTHONIOENCODING and
> set this to utf-8:
>
> http://www.egenix.com/products/python/PyRun/doc/#_Toc452660008
>
> The encoding guessing is still available by setting the env
> var to "" (but this is hardly used).
>
> So far this has been working great.

My concern is, people other than me running Python scripts on such systems
(which has only C locale).
Most unix commands runs well in C locale.  But only Python script get many
trouble.

* locale error when just running Python script. (when bad LANG setting).

* Unicode error happen when stdout is piped, while runs well when
  without pipe (when LANG=C, and no PYTHONIOENCODING set).

* open() without explicit `encoding='utf-8'` runs well on Mac and LANG=*.utf8
  environment.  But UnicodeError happen on LANG=C environment.

(Actually speaking, I and my company doesn't use UTF-8 filename.
So we don't get trouble about fsencoding.  But some other companies may.)


On such system, site-wide configuration to override `nl_langinfo(CODESET)`
may help people. Otherwise:

1 Face locale error when running Python script, and write LANG=C to
their .bashrc.

2 Face UnicodeError when piping from Python script, and write
   PYTHONIOENCODING=utf-8 in their .bashrc.

3 Face UnicodeError when reading/writing from text file, and add
explicit `encoding='utf-8'`
   (This bug may be not found on CI environment having *.UTF-8 locale,
and happens
    in production environment)

4 Finally, people feel Python is troublesome language, and they don't
want to use Python
  anymore.

I know about `/etc/environment` file.  But OPs doesn't like adding
lines to it only for Python.
They feel "Perl (or Ruby) is better than Python".


This is why I think configuration option or site-wide configuration is
desirable even if
we have PYTHON(IO|FS|PREFERRED)ENCODINGS environment variables.


>
> --
> Marc-Andre Lemburg
> eGenix.com
>
> Professional Python Services directly from the Experts (#1, Aug 30 2016)
>>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>>> Python Database Interfaces ...           http://products.egenix.com/
>>>> Plone/Zope Database Interfaces ...           http://zope.egenix.com/
> ________________________________________________________________________
>
> ::: We implement business ideas - efficiently in both time and costs :::
>
>    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>            Registered at Amtsgericht Duesseldorf: HRB 46611
>                http://www.egenix.com/company/contact/
>                       http://www.malemburg.com/
>



-- 
INADA Naoki  <songofacandy at gmail.com>


More information about the Python-ideas mailing list