Windows python 3.8 module finder problem.

Sun Apr 5 09:22:13 EDT 2020

I have code that uses the modulefinder:

            mf = modulefinder.ModuleFinder()
            mf.run_script( self.main_program )

with python 3.7 all works without problems. But python 3.8 tracebacks (TB), here is the end of the TB:

  File "C:\Python38.Win64\lib\modulefinder.py", line 326, in import_module
    m = self.load_module(fqname, fp, pathname, stuff)
  File "C:\Python38.Win64\lib\modulefinder.py", line 344, in load_module
    co = compile(fp.read()+'\n', pathname, 'exec')
  File "C:\Python38.Win64\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 308: character maps to <undefined>

I added this debug print in both the 3.7 and 3.8 code of modulefinder.py:

    def load_module(self, fqname, fp, pathname, file_info):
        print('QQQ load_module(%r, %r, %r, %r)' % (fqname, fp, pathname, file_info))

The file that causes the TB is functools.py as there is text that is non-ASCII at offset 308.

The debug shows this for functools.py:

QQQ load_module('functools', <_io.TextIOWrapper name='C:\\Python37.win64\\lib\\functools.py' mode='r' encoding='utf-8'>, 'C:\\Python37.win64\\lib\\functools.py', ('.py', 'r', 1))

QQQ load_module('functools', <_io.TextIOWrapper name='C:\\Python38.Win64\\lib\\functools.py' mode='r' encoding='cp1252'>, 'C:\\Python38.Win64\\lib\\functools.py', ('.py', 'r', 1))

In 3.7 the fp is opened with encoding UTF-8, but on 3.8 its cp1252.

The code in modulefinder does not seem to handle encoding when opening .py files.
Adding an explicit coding comment to functools.py did not work.
So the default encoding will be used; which is locale.getpreferredencoding(False) according
to the docs for open().

How did modulefinder end up wth utf-8 encoding being used? It does seem to look at
chcp setting.

On both 3.7 and 3.8 I see that locale.getpreferredencoding(False) returns 'cp1252'.
I have not figured out how the 3.7 code manages to use utf-8 that is required to get things
working.

I can workaround this by setting PYTHONUTF8=1, but I want to change the behavour from within python.

I have failed to find a way to change what is returned by locale.getpreferredencoding(False) from
within python. Is the only way to set the PYTHONUTF8?

Barry