[issue13621] Unicode performance regression in python3.3 vs python3.2

Sat Dec 17 18:58:09 CET 2011

STINNER Victor <victor.stinner at haypocalc.com> added the comment:

Sorted and grouped results. "replace", "find" and "concat" should be easy to fix, "format" is a little bit more complex, "strip" and "split" depends on "find" performance and require to scan the substring to ensure that the result is canonical (except if inputs are all ASCII, which is the case in these examples).

replace:

- "...text.with.2000.lines...replace("\n", " ") (*10): -37.668161%
- "...text.with.2000.lines...replace("\n", " ") (*10): -37.668161%

find:

- ("A"*1000).find("B") (*1000): -30.379747%
- "Andrew"+"Dalke" (*1000): -23.076923%- ("A"*1000).find("B") (*1000): -30.379747%

- "Andrew".startswith("A") (*1000): -20.588235%
- "Andrew".startswith("Anders") (*1000): -23.529412%
- "Andrew".startswith("A") (*1000): -20.588235%
- "Andrew".startswith("Anders") (*1000): -23.529412%

- "Andrew".endswith("w") (*1000): -23.529412%
- "Andrew".endswith("Andrew") (*1000): -22.857143%
- "Andrew".endswith("Anders") (*1000): -23.529412%
- "Andrew".endswith("w") (*1000): -23.529412%
- "Andrew".endswith("Andrew") (*1000): -22.857143%
- "Andrew".endswith("Anders") (*1000): -23.529412%

- "B" in "A"*1000 (*1000): -32.089552%
- "B" in "A"*1000 (*1000): -32.089552%

concat:

- "Andrew"+"Dalke" (*1000): -23.076923%

format:

- "The %(k1)s is %(k2)s the %(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000): -49.411765%
- "The %(k1)s is %(k2)s the %(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000): -49.411765%

strip:

- "\nHello!\n".strip() (*1000): -33.333333%
- "Hello!\n".strip() (*1000): -35.714286%
- "\nHello!".strip() (*1000): -28.571429%
- "\nHello!\n".strip() (*1000): -33.333333%
- "Hello!\n".strip() (*1000): -35.714286%
- "\nHello!".strip() (*1000): -28.571429%

- "Hello\t   \t".rstrip() (*1000): -33.333333%
- "\t   \tHello".rstrip() (*1000): -33.333333%
- "Hello!\n".rstrip() (*1000): -35.714286%
- "\nHello!".rstrip() (*1000): -35.714286%
- "Hello\t   \t".rstrip() (*1000): -33.333333%
- "\t   \tHello".rstrip() (*1000): -33.333333%
- "Hello!\n".rstrip() (*1000): -35.714286%
- "\nHello!".rstrip() (*1000): -35.714286%

split:

- dna.split("ACTAT") (*10): -21.066667%
- ("Here are some words. "*2).split() (*1000): -22.105263%
- "this\nis\na\ntest\n".split("\n") (*1000): -23.437500%
- "this--is--a--test--of--the--emergency--broadcast--system".split("--") (*1000): -22.429907%
- dna.split("ACTAT") (*10): -21.066667%
- ("Here are some words. "*2).split() (*1000): -22.105263%
- "this\nis\na\ntest\n".split("\n") (*1000): -23.437500%
- "this--is--a--test--of--the--emergency--broadcast--system".split("--") (*1000): -22.429907%

- "this\nis\na\ntest\n".rsplit("\n") (*1000): -23.437500%
- "this\nis\na\ntest\n".rsplit("\n") (*1000): -23.437500%

- ("A"*1000).rpartition("A") (*1000): -21.212121%
- ("A"*1000).rpartition("A") (*1000): -21.212121%

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13621>
_______________________________________