[Python-ideas] PEP 540: Add a new UTF-8 mode

Stephan Houben stephanh42 at gmail.com
Fri Jan 6 01:22:52 EST 2017


Hi all,

One meta-question I have which may already have been discussed much earlier
in this whole proposal series, is:
How common is this problem?

Because I have the impression that nowadays all Linux distributions are
UTF-8 by default and you have to show some
bloody-mindedness to end up with a POSIX locale.

Docker was mentioned, is this not really an issue which should be solved at
the Docker level?
Since it would affect *all* applications which are doing something
non-trivial with encodings?

I realise there is some attractiveness in solving the issue "for Python",
since that will reduce the amount of bug reports
and get people off the chests of the maintainers, but to get this fixed in
the wider Linux ecosystem it might be preferable to
"Let them eat mojibake", to paraphrase what Marie-Antoinette never said.

Stephan

2017-01-06 5:49 GMT+01:00 Steven D'Aprano <steve at pearwood.info>:

> On Fri, Jan 06, 2017 at 02:54:49AM +0100, Victor Stinner wrote:
>
> > Let's say that you have the filename b'nonascii\xff': it's decoded as
> > 'nonascii\xdcff' by the UTF-8 mode. How do GUIs handle such filename?
> > (I don't know the answer, it's a real question ;-))
>
> I ran this in Python 2.7 to create the file:
>
> open(b'/tmp/nonascii\xff-', 'w')
>
> and then confirmed the filename:
>
> [steve at ando tmp]$ ls -b nonascii*
> nonascii\377-
>
> Konquorer in KDE 3 displays it with *two* "missing character" glyphs
> (small hollow boxes) before the hyphen. The KDE "Open File" dialog box
> shows the file with two blank spaces before the hyphen.
>
> My interpretation of this is that the difference is due to using
> different fonts: the file name is shown the same way, but in one font
> the missing character is a small box and in the other it is a blank
> space.
>
> I cannot tell what KDE is using for the invalid character, if I copy it
> as text and paste it into a file I just get the original \xFF.
>
> The Geany text editor, which I think uses the same GUI toolkit as Gnome,
> shows the file with a single "missing glyph" character, this time a
> black diamond with a question mark in it.
>
> It looks like Geany (Gnome?) is displaying the invalid byte as U+FFFD,
> the Unicode "REPLACEMENT CHARACTER".
>
> So at least two Linux GUI environments are capable of dealing with
> filenames that are invalid UTF-8, in two different ways.
>
> Does this answer your question about GUIs?
>
>
> --
> Steve
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170106/e968f993/attachment.html>


More information about the Python-ideas mailing list