[issue9974] tokenizer.untokenize not invariant with line continuations

Brian Bossé report at bugs.python.org
Fri Oct 1 18:09:23 CEST 2010


Brian Bossé <penrif at gmail.com> added the comment:

No idea if I'm getting the patch format right here, but tally ho!

This is keyed from release27-maint

Index: Lib/tokenize.py
===================================================================
--- Lib/tokenize.py	(revision 85136)
+++ Lib/tokenize.py	(working copy)
@@ -184,8 +184,13 @@
 
     def add_whitespace(self, start):
         row, col = start
-        assert row <= self.prev_row
         col_offset = col - self.prev_col
+        # Nearly all newlines are handled by the NL and NEWLINE tokens,
+        # but explicit line continuations are not, so they're handled here.
+        if row > self.prev_row:  
+            row_offset = row - self.prev_row
+            self.tokens.append("\\\n" * row_offset)
+            col_offset = col  # Recalculate the column offset from the start of our new line
         if col_offset:
             self.tokens.append(" " * col_offset)

Two issues remain with this fix, both of which replace the assert with something functional but not exactly what the original text is:
1)  Whitespace leading up to a line continuation is not recreated.  The information required to do this is not present in the tokenized data.
2)  If EOF happens at the end of a line, the untokenized version will have a line continuation on the end, as the ENDMARKER token is represented on a line which does not exist in the original.

I spent some time trying to get a unit test written that demonstrates the original bug, but it would seem that doctest (which test_tokenize uses) cannot represent a '\' character properly.  The existing unit tests involving line continuations pass due to the '\' characters being interpreted as ERRORTOKEN, which is not as they're done when read from file or interactive prompt.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9974>
_______________________________________


More information about the Python-bugs-list mailing list