[issue38755] Long unicode string causes SyntaxError: Non-UTF-8 code starting with '\xe2' in file ..., but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

Sat Nov 9 07:26:48 EST 2019

New submission from Andrew Ushakov <andrew.ushakov at gmail.com>:

Not very long unicode comment #, space and then 170 or more repetitions of the utf8 symbol ░ (b'\xe2\x96\x91'.decode()) 

# ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

causes syntax error:

SyntaxError: Non-UTF-8 code starting with '\xe2' in file tst112.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

Python file is attached. Second example is similar, but here unicode string with similar length is used as an argument of a print function.

print('\n░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░')

Similar Issue34979 was submitted one year ago...

----------
components: Interpreter Core
files: tst112.py
messages: 356298
nosy: Andrew Ushakov
priority: normal
severity: normal
status: open
title: Long unicode string causes SyntaxError: Non-UTF-8 code starting with '\xe2' in file ..., but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
versions: Python 3.8
Added file: https://bugs.python.org/file48703/tst112.py

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue38755>
_______________________________________