[pypy-issue] Issue #1952: pypy3 2.4.0 incorrect tokenization of unicode literal (pypy/pypy)

Fri Jan 2 00:48:41 CET 2015

New issue 1952: pypy3 2.4.0 incorrect tokenization of unicode literal
https://bitbucket.org/pypy/pypy/issue/1952/pypy3-240-incorrect-tokenization-of

Anthony Sottile:

Simple testcase:

```
# -*- coding: UTF-8 -*-
from __future__ import unicode_literals
import io
import tokenize

test = '# -*- coding: UTF-8 -*-\nu"""☃☃☃"""'

for (
        token_type, token_str, _, _, _
) in tokenize.generate_tokens(io.StringIO(test).readline):
    print('{0} - {1!r}'.format(tokenize.tok_name[token_type], token_str))
```

Output under pypy3 (2.4.0):
```
$ pypy3 --version
Python 3.2.5 (b2091e973da6, Oct 19 2014, 18:29:55)
[PyPy 2.4.0 with GCC 4.6.3]
$ pypy3 test.py
COMMENT - '# -*- coding: UTF-8 -*-'
NL - '\n'
NAME - 'u'
STRING - '"""☃☃☃"""'
ENDMARKER - ''
```

Output under CPython3.3 (3.3.6) / CPython3.4 (3.4.0):

```
$ python3.3 test.py
COMMENT - '# -*- coding: UTF-8 -*-'
NL - '\n'
STRING - 'u"""☃☃☃"""'
ENDMARKER - ''
```

Note that this does match the output under CPython3.2, but seems wrong as pypy3 supports unicode prefix literals