[New-bugs-announce] [issue38663] Untokenize does not round-trip ws before bs-nl

Fri Nov 1 13:31:35 EDT 2019

New submission from Edward K Ream <edreamleo at gmail.com>:

Tested on 3.6.

tokenize.untokenize does not round-trip whitespace before backslash-newlines outside of strings:

from io import BytesIO
import tokenize

# Round tripping fails on the second string.
table = (
r'''
print\
    ("abc")
''',
r'''
print \
    ("abc")
''',
)
for s in table:
    tokens = list(tokenize.tokenize(
        BytesIO(s.encode('utf-8')).readline))
    result = g.toUnicode(tokenize.untokenize(tokens))
    print(result==s)

I have an important use case that would benefit from a proper untokenize. After considerable study, I have not found a proper fix for tokenize.add_whitespace.

I would be happy to work with anyone to rewrite tokenize.untokenize so that unit tests pass without fudges in TestRoundtrip.check_roundtrip.

----------
messages: 355827
nosy: edreamleo
priority: normal
severity: normal
status: open
title: Untokenize does not round-trip ws before bs-nl
type: behavior
versions: Python 3.6

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue38663>
_______________________________________