[New-bugs-announce] [issue43014] tokenize spends a lot of time in `re.compile(...)`
Anthony Sottile
report at bugs.python.org
Sun Jan 24 03:34:14 EST 2021
New submission from Anthony Sottile <asottile at umich.edu>:
I did some profiling (attached a few files here with svgs) of running this script:
```python
import io
import tokenize
# picked as the second longest file in cpython
with open('Lib/test/test_socket.py', 'rb') as f:
bio = io.BytesIO(f.read())
def main():
for _ in range(10):
bio.seek(0)
for _ in tokenize.tokenize(bio.readline):
pass
if __name__ == '__main__':
exit(main())
```
the first profile is before the optimization, the second is after the optimization
The optimization takes the execution from ~6300ms to ~4500ms on my machine (representing a 28% - 39% improvement depending on how you calculate it)
(I'll attach the pstats and svgs after creation, seems I can only attach one file at once)
----------
components: Library (Lib)
files: out.pstats
messages: 385572
nosy: Anthony Sottile
priority: normal
severity: normal
status: open
title: tokenize spends a lot of time in `re.compile(...)`
type: performance
versions: Python 3.10, Python 3.9
Added file: https://bugs.python.org/file49759/out.pstats
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue43014>
_______________________________________
More information about the New-bugs-announce
mailing list