filesystem encoding 'strict' on Windows

Fri Sep 30 04:42:47 EDT 2016

On Fri, Sep 30, 2016 at 5:58 AM, iMath <redstone-cold at 163.com> wrote:
> the doc of os.fsencode(filename)  says  Encode filename to the filesystem encoding 'strict'
> on Windows, what does 'strict' mean ?

"strict" is the error handler for the encoding. It raises a
UnicodeEncodeError for unmapped characters. For example:

    >>> 'αβψδ'.encode('mbcs', 'strict')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeEncodeError: 'mbcs' codec can't encode characters in
    position 0--1: invalid character

On the other hand, the "replace" error handler is lossy. With the
Windows "mbcs" codec, it substitutes question marks and best-fit
mappings for characters that aren't defined in the system locale's
ANSI codepage (e.g. 1252). For example:

    >>> print('αβψδ'.encode('mbcs', 'replace').decode('mbcs'))
    aß?d

This is the behavior of os.listdir with bytes paths, which is why
using bytes paths has been deprecated on Windows since 3.3.

In 3.6 bytes paths are provisionally allowed again because the
filesystem encoding has changed to UTF-8 (internally transcoded to the
native UTF-16LE) and uses the "surrogatepass" error handler to allow
lone surrogate codes (allowed by Windows). See PEP 529 for more
information.