[issue42687] tokenize module does not recognize Barry as FLUFL

Sat Dec 19 10:46:27 EST 2020

New submission from Erik Soma <stillusingirc at gmail.com>:

'<>' is not recognized by the tokenize module as a single token, instead it is two tokens.

```
$ python -c "import tokenize; import io; import pprint; pprint.pprint(list(tokenize.tokenize(io.BytesIO(b'<>').readline)))"
[TokenInfo(type=62 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line=''),
 TokenInfo(type=54 (OP), string='<', start=(1, 0), end=(1, 1), line='<>'),
 TokenInfo(type=54 (OP), string='>', start=(1, 1), end=(1, 2), line='<>'),
 TokenInfo(type=4 (NEWLINE), string='', start=(1, 2), end=(1, 3), line=''),
 TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')]
```

I would expect:
```
[TokenInfo(type=62 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line=''),
 TokenInfo(type=54 (OP), string='<>', start=(1, 0), end=(1, 2), line='<>'),
 TokenInfo(type=4 (NEWLINE), string='', start=(1, 2), end=(1, 3), line=''),
 TokenInfo(type=0 (ENDMARKER), string='', start=(2, 0), end=(2, 0), line='')]
```

This is the behavior of the CPython tokenizer which the tokenizer module tries "to match the working of".

----------
messages: 383384
nosy: esoma
priority: normal
severity: normal
status: open
title: tokenize module does not recognize Barry as FLUFL
versions: Python 3.10, Python 3.9

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue42687>
_______________________________________