Should stdlib files contain 'narrow non breaking space' U+202F?

Fri Dec 18 01:51:32 EST 2015

On Fri, Dec 18, 2015 at 5:36 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> Last I knew, Guido still wanted stdlib files to be all-ascii, especially
> possibly in special cases. There is no good reason I can think of for there
> to be an invisible non-ascii space in a comment.  It strikes me as most
> likely an accident (typo) that should be fixed.  I suspect the same of most
> of the following.  Perhaps you should file an issue (and patch?) on the
> tracker.

You're probably right on that one. Here's others - and the script I
used to find them.

import os
for root, dirs, files in os.walk("."):
    if "test" in root: continue
    for fn in files:
        if not fn.endswith(".py"): continue
        if "test" in fn: continue
        with open(os.path.join(root,fn),"rb") as f:
            for l,line in enumerate(f):
                try:
                    line.decode("ascii")
                    continue # Ignore the ASCII lines
                except UnicodeDecodeError:
                    line = line.rstrip(b"\n")
                    try: line = line.decode("UTF-8")
                    except UnicodeDecodeError: line = repr(line) # If
it's not UTF-8 either, show it as b'...'
                    print("%s:%d: %s" % (fn,l,line))

shlex.py:37:             self.wordchars += ('ßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ'
shlex.py:38:                                'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ')
functools.py:7: # and Łukasz Langa <lukasz at langa.pl>.
heapq.py:34: [explanation by François Pinard]
getopt.py:21: # Peter Åstrand <astrand at lysator.liu.se> added gnu_getopt().
sre_compile.py:26:     (0x69, 0x131), # iı
sre_compile.py:28:     (0x73, 0x17f), # sſ
sre_compile.py:30:     (0xb5, 0x3bc), # µμ
sre_compile.py:32:     (0x345, 0x3b9, 0x1fbe), # \u0345ιι
sre_compile.py:34:     (0x390, 0x1fd3), # ΐΐ
sre_compile.py:36:     (0x3b0, 0x1fe3), # ΰΰ
sre_compile.py:38:     (0x3b2, 0x3d0), # βϐ
sre_compile.py:40:     (0x3b5, 0x3f5), # εϵ
sre_compile.py:42:     (0x3b8, 0x3d1), # θϑ
sre_compile.py:44:     (0x3ba, 0x3f0), # κϰ
sre_compile.py:46:     (0x3c0, 0x3d6), # πϖ
sre_compile.py:48:     (0x3c1, 0x3f1), # ρϱ
sre_compile.py:50:     (0x3c2, 0x3c3), # ςσ
sre_compile.py:52:     (0x3c6, 0x3d5), # φϕ
sre_compile.py:54:     (0x1e61, 0x1e9b), # ṡẛ
sre_compile.py:56:     (0xfb05, 0xfb06), # ﬅﬆ
punycode.py:2: Written by Martin v. Löwis.
koi8_t.py:2: # http://ru.wikipedia.org/wiki/КОИ-8
__init__.py:0: # Copyright (C) 2005 Martin v. Löwis
client.py:737:         a Date representing the file’s last-modified time, a
client.py:739:         containing a guess at the file’s type. See also the
bdist_msi.py:0: # Copyright (C) 2005, 2006 Martin von Löwis
connection.py:399:             # Issue # 20540: concatenate before
sending, to avoid delays due
message.py:531:                        filename=('utf-8', '', Fußballer.ppt'))
message.py:533:                        filename='Fußballer.ppt'))
request.py:181:     * geturl() — return the URL of the resource
retrieved, commonly used to
request.py:184:     * info() — return the meta-information of the
page, such as headers, in the
request.py:188:     * getcode() – return the HTTP status code of the
response.  Raises URLError
dbapi2.py:2: # Copyright (C) 2004-2005 Gerhard Häring <gh at ghaering.de>
__init__.py:2: # Copyright (C) 2005 Gerhard Häring <gh at ghaering.de>

They're nearly all comments. A few string literals.

I would be inclined to ASCIIfy the apostrophes, dashes, and the
connection.py space that started this thread. People's names, URLs,
and demonstrative characters I'm more inclined to leave. Agreed?

ChrisA