[Python-ideas] PEP 540: Add a new UTF-8 mode

Chris Barker chris.barker at noaa.gov
Thu Jan 12 19:58:07 EST 2017


On Thu, Jan 12, 2017 at 7:50 AM, Stephen J. Turnbull <turnbull.stephen.fw at u.
tsukuba.ac.jp> wrote:

>  > So I see no downside to using utf-8 when the C locale is defined.
>
> You don't have much incentive to look for one, and I doubt you have
> the experience of the edge cases (if you do, please correct me), so
> that does not surprise me.
>

that's correct. I left out a sentence:

This is a good time for others' with experience with the ugly edge cases to
speak up!

The real challenge is that "output" has three (at least :-) ) use cases:

1) Passing on data the came from input from the same system: Victors' "Unix
pipe style". In that case, if a supposedly ASCII-based system has non ascii
data, most users would want it to get passed through unchanged. They not
likely to expect their python program to enforce their locale (unless it
was a program designed to do that - but then it could be explicit about
things).

2) The program generating data itself: the mentioned "outputting boxes to
the console" example. I think that folks writing these programs should
consider whether they really need non-ascii output -- but if they do do
this -- I"d image most folks would rather see weird characters in the
console than have the program crash.

So these both point to utf-8 (with surrogateescape)

3) A program getting input from a user, or a data file, or......  (like a
filename, etc). This may be a program intended to be able to handle unicode
filenames, etc. (this is my use-case :-) ) -- what should it do when run on
an ASCII-only system?

This is the tough one -- if the C-locale indicated "non configured", then
users would likely want the _something_ written to the FS, rather than a
program crash: so utf-8.

However, if the system really is properly configured to be ASCII only, then
they may want a program to never write non-ascii to the filesystem.
However, ultimately, I think it's up to the application developer, rather
than to Python itself (Or the sysadmin for the OS that it's running on) to
know whether the app is supposed to support non-ascii filenames, etc. i.e.
one should expect that running a unicode-aware app on an ascii-only
filesystem is going to lead to problems anyway.

So I think the "never crash" option is the better one in this imperfect
trade-off.

-CHB













-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170112/0bfbfd16/attachment-0001.html>


More information about the Python-ideas mailing list