[New-bugs-announce] [issue28531] Improve utf7 encoder memory usage

Xiang Zhang report at bugs.python.org
Tue Oct 25 12:16:46 EDT 2016


New submission from Xiang Zhang:

Currently utf7 encoder uses an aggressive memory allocation strategy: use the worst case 8. We can tighten the worst case.

For 1 byte and 2 byte unicodes, the worst case could be 3*n + 2. For 4 byte unicodes, the worst case could be 6*n + 2.

There are 2 cases. First, all characters needs to be encoded, the result length should be upper_round(2.67*n) + 2 <= 3*n + 2. Second, encode and not encode characters appear one by one. For even length, it's 3n < 3n + 2. For odd length, it's exactly 3n + 2.

This won't benefit much when the string is short. But when the string is long, it speeds up.

Without patch:

[bin]$ ./python3 -m perf timeit -s 's = "abc"*10' 's.encode("utf7")'
....................
Median +- std dev: 2.79 us +- 0.09 us
[bin]$ ./python3 -m perf timeit -s 's = "abc"*100' 's.encode("utf7")'
....................
Median +- std dev: 4.55 us +- 0.13 us
[bin]$ ./python3 -m perf timeit -s 's = "abc"*1000' 's.encode("utf7")'
....................
Median +- std dev: 14.0 us +- 0.4 us
[bin]$ ./python3 -m perf timeit -s 's = "abc"*10000' 's.encode("utf7")'
....................
Median +- std dev: 178 us +- 1 us

With patch:

[bin]$ ./python3 -m perf timeit -s 's = "abc"*10' 's.encode("utf7")'
....................
Median +- std dev: 2.87 us +- 0.09 us
[bin]$ ./python3 -m perf timeit -s 's = "abc"*100' 's.encode("utf7")'
....................
Median +- std dev: 4.50 us +- 0.23 us
[bin]$ ./python3 -m perf timeit -s 's = "abc"*1000' 's.encode("utf7")'
....................
Median +- std dev: 13.3 us +- 0.4 us
[bin]$ ./python3 -m perf timeit -s 's = "abc"*10000' 's.encode("utf7")'
....................
Median +- std dev: 102 us +- 1 us

The patch also removes a check, base64bits can only be not 0 when inShift is not 0.

----------
components: Interpreter Core
files: utf7_encoder.patch
keywords: patch
messages: 279419
nosy: haypo, serhiy.storchaka, xiang.zhang
priority: normal
severity: normal
stage: patch review
status: open
title: Improve utf7 encoder memory usage
type: enhancement
versions: Python 3.7
Added file: http://bugs.python.org/file45219/utf7_encoder.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue28531>
_______________________________________


More information about the New-bugs-announce mailing list