[Python-Dev] Bytes path support

Nick Coghlan ncoghlan at gmail.com
Thu Aug 21 16:12:50 CEST 2014


On 21 August 2014 23:58, Marko Rauhamaa <marko at pacujo.net> wrote:
>
> My point is that the poor programmer cannot ignore the possibility of
> "funny" character sets. If Python tried to protect the programmer from
> that possibility, the result might be even more intractable: how to act
> on a file with an non-UTF-8 filename if you are unable to express it as
> a text string?

That's what the "surrogateescape" codec is for - we use it by default
on most OS interfaces, and it's implicit in the use of "os.fsencode"
and "os.fsdecode". Starting with Python 3, it's also enabled on
sys.stdout by default, so that "print(os.listdir(dirname))" will pass
the original raw bytes through to the terminal the same way Python 2
does.

The docs could use additional details as to which interfaces do and
don't have surrogateescape enabled by default, but for the time being,
the description of the codec error handler just links out to the
original definition in PEP 383.

It may also be useful to have some tools for detecting and cleaning
strings containing surrogate escaped data, but there hasn't been a
concrete proposal along those lines as yet. Personally, I'm currently
waiting to see if the Fedora or OpenStack folks indicate a need for
such tools before proposing any additions.

Regards,
Nick.

>
>
> Marko
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com



-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list