[issue20387] tokenize/untokenize roundtrip fails with tabs

Sat Jun 20 09:13:46 CEST 2015

Dingyuan Wang added the comment:

Sorry for the inconvenience. I failed to find this old bug.

I think there is another problem. The docs of `untokenize` said "The iterable must return sequences with **at least** two elements, the token type and the token string. Any additional sequence elements are ignored.", so if I feed in, say, a 3-tuple, the untokenize should accept it as tok[:2].

The attached patch should have addressed the problems above. 

When trying to make a patch, a tokenize bug was found. Consider the new attached tab.py, the tabs between comments and code, and the tabs between expressions are lost, so when untokenizing, position information is used to produce equivalent spaces, instead of tabs.

Despite the tokenization problem, the patch can produce syntactically correct code as accurately as it can.

The PEP 8 recommends spaces for indentation, but the usage of tabs should not be ignored.

new tab.py (in Python string):

'#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\ndef foo():\n\t"""\n\tTests tabs in tokenization\n\t\tfoo\n\t"""\n\tpass\n\tpass\n\tif 1:\n\t\t# not indent correctly\n\t\tpass\n\t\t# correct\ttab\n\t\tpass\n\tpass\n\tbaaz = {\'a\ttab\':\t1,\n\t\t\t\'b\': 2}\t\t# also fails\n\npass\n#if 2:\n\t#pass\n#pass\n'

----------
keywords: +patch
nosy: +gumblex
Added file: http://bugs.python.org/file39748/tokenize.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue20387>
_______________________________________