[issue17061] tokenize unconditionally emits NL after comment lines & blank lines

Mon Jan 28 12:14:28 CET 2013

New submission from Thomas Kluyver:

The docs describe the NL token as "Token value used to indicate a non-terminating newline. The NEWLINE token indicates the end of a logical line of Python code; NL tokens are generated when a logical line of code is continued over multiple physical lines."

However, after a comment or a blank line, tokenize emits NL, even when it's not inside a multi-line statement. For example:

In [15]: for tok in tokenize.generate_tokens(StringIO('#comment\n').readline):  print(tok)
TokenInfo(type=54 (COMMENT), string='#comment', start=(1, 0), end=(1, 8), line='#comment\n')
TokenInfo(type=55 (NL), string='\n', start=(1, 8), end=(1, 9), line='#comment\n')
TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')

This makes it difficult to use tokenize to detect multi-line statements, as we want to do in IPython.

In my tests so far, changing two instances of NL to NEWLINE in this block (lines 530 & 533) makes it behave as I expect:
http://hg.python.org/cpython/file/a375c3d88c7e/Lib/tokenize.py#l524

----------
messages: 180846
nosy: takluyver
priority: normal
severity: normal
status: open
title: tokenize unconditionally emits NL after comment lines & blank lines
versions: Python 2.6, Python 2.7, Python 3.2, Python 3.3

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue17061>
_______________________________________