filesystem encoding 'strict' on Windows

eryk sun eryksun at gmail.com
Fri Sep 30 04:42:47 EDT 2016


On Fri, Sep 30, 2016 at 5:58 AM, iMath <redstone-cold at 163.com> wrote:
> the doc of os.fsencode(filename)  says  Encode filename to the filesystem encoding 'strict'
> on Windows, what does 'strict' mean ?

"strict" is the error handler for the encoding. It raises a
UnicodeEncodeError for unmapped characters. For example:

    >>> 'αβψδ'.encode('mbcs', 'strict')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeEncodeError: 'mbcs' codec can't encode characters in
    position 0--1: invalid character

On the other hand, the "replace" error handler is lossy. With the
Windows "mbcs" codec, it substitutes question marks and best-fit
mappings for characters that aren't defined in the system locale's
ANSI codepage (e.g. 1252). For example:

    >>> print('αβψδ'.encode('mbcs', 'replace').decode('mbcs'))
    aß?d

This is the behavior of os.listdir with bytes paths, which is why
using bytes paths has been deprecated on Windows since 3.3.

In 3.6 bytes paths are provisionally allowed again because the
filesystem encoding has changed to UTF-8 (internally transcoded to the
native UTF-16LE) and uses the "surrogatepass" error handler to allow
lone surrogate codes (allowed by Windows). See PEP 529 for more
information.



More information about the Python-list mailing list