[issue14625] Faster utf-32 decoder
Serhiy Storchaka
report at bugs.python.org
Mon Apr 23 23:01:40 CEST 2012
Serhiy Storchaka <storchaka at gmail.com> added the comment:
Here are the results of benchmarking (numbers in MB/s).
On 32-bit Linux, AMD Athlon 64 X2 4600+ @ 2.4GHz:
Py2.7 Py3.2 Py3.3 patchA patchB
utf-32le 'A'*10000 461 (+215%) 454 (+220%) 292 (+398%) 1213 (+20%) 1454
utf-32le '\x80'*10000 458 (+177%) 454 (+180%) 271 (+369%) 1124 (+13%) 1270
utf-32le '\x80'+'A'*9999 462 (+171%) 454 (+175%) 271 (+361%) 1111 (+13%) 1250
utf-32le '\u0100'*10000 457 (+133%) 454 (+135%) 220 (+385%) 1002 (+6%) 1067
utf-32le '\u0100'+'A'*9999 461 (+129%) 454 (+132%) 220 (+379%) 993 (+6%) 1054
utf-32le '\u0100'+'\x80'*9999 458 (+131%) 454 (+133%) 220 (+381%) 1002 (+6%) 1059
utf-32le '\u8000'*10000 457 (+133%) 454 (+135%) 220 (+385%) 1002 (+6%) 1066
utf-32le '\u8000'+'A'*9999 462 (+128%) 454 (+132%) 220 (+379%) 994 (+6%) 1053
utf-32le '\u8000'+'\x80'*9999 457 (+132%) 454 (+134%) 220 (+382%) 1000 (+6%) 1061
utf-32le '\u8000'+'\u0100'*9999 457 (+132%) 454 (+134%) 220 (+382%) 1002 (+6%) 1061
utf-32le '\U00010000'*10000 386 (+167%) 416 (+148%) 212 (+386%) 930 (+11%) 1031
utf-32le '\U00010000'+'A'*9999 461 (+126%) 415 (+152%) 222 (+370%) 940 (+11%) 1044
utf-32le '\U00010000'+'\x80'*9999 458 (+125%) 415 (+148%) 222 (+364%) 930 (+11%) 1031
utf-32le '\U00010000'+'\u0100'*9999 458 (+125%) 415 (+148%) 212 (+386%) 930 (+11%) 1031
utf-32le '\U00010000'+'\u8000'*9999 458 (+125%) 415 (+149%) 222 (+365%) 930 (+11%) 1032
utf-32be 'A'*10000 461 (+216%) 454 (+221%) 292 (+399%) 1209 (+20%) 1456
utf-32be '\x80'*10000 457 (+177%) 454 (+179%) 271 (+368%) 1125 (+13%) 1268
utf-32be '\x80'+'A'*9999 462 (+171%) 453 (+176%) 271 (+362%) 1112 (+12%) 1251
utf-32be '\u0100'*10000 457 (+144%) 453 (+146%) 220 (+407%) 1048 (+6%) 1116
utf-32be '\u0100'+'A'*9999 462 (+139%) 454 (+143%) 220 (+402%) 1034 (+7%) 1104
utf-32be '\u0100'+'\x80'*9999 459 (+142%) 453 (+145%) 220 (+405%) 1047 (+6%) 1112
utf-32be '\u8000'*10000 457 (+144%) 453 (+147%) 220 (+408%) 1046 (+7%) 1117
utf-32be '\u8000'+'A'*9999 462 (+139%) 454 (+143%) 220 (+402%) 1034 (+7%) 1104
utf-32be '\u8000'+'\x80'*9999 459 (+142%) 453 (+145%) 220 (+405%) 1045 (+6%) 1112
utf-32be '\u8000'+'\u0100'*9999 459 (+142%) 454 (+144%) 220 (+404%) 1047 (+6%) 1109
utf-32be '\U00010000'*10000 386 (+155%) 416 (+137%) 212 (+364%) 940 (+5%) 984
utf-32be '\U00010000'+'A'*9999 461 (+116%) 415 (+140%) 213 (+367%) 948 (+5%) 994
utf-32be '\U00010000'+'\x80'*9999 458 (+115%) 415 (+137%) 222 (+343%) 938 (+5%) 983
utf-32be '\U00010000'+'\u0100'*9999 458 (+115%) 415 (+137%) 212 (+364%) 940 (+5%) 983
utf-32be '\U00010000'+'\u8000'*9999 458 (+115%) 415 (+137%) 222 (+343%) 939 (+5%) 983
On 32-bit Linux, Intel Atom N570 @ 1.66GHz:
Py2.7 Py3.2 Py3.3 patchA patchB
utf-32le 'A'*10000 165 (+173%) 165 (+173%) 100 (+350%) 389 (+16%) 450
utf-32le '\x80'*10000 165 (+159%) 165 (+159%) 76 (+462%) 374 (+14%) 427
utf-32le '\x80'+'A'*9999 165 (+161%) 165 (+161%) 76 (+466%) 374 (+15%) 430
utf-32le '\u0100'*10000 165 (+119%) 165 (+119%) 81 (+346%) 333 (+8%) 361
utf-32le '\u0100'+'A'*9999 165 (+120%) 165 (+120%) 81 (+348%) 334 (+9%) 363
utf-32le '\u0100'+'\x80'*9999 165 (+119%) 165 (+119%) 81 (+347%) 334 (+8%) 362
utf-32le '\u8000'*10000 165 (+119%) 165 (+119%) 80 (+352%) 333 (+9%) 362
utf-32le '\u8000'+'A'*9999 165 (+119%) 165 (+119%) 81 (+347%) 334 (+8%) 362
utf-32le '\u8000'+'\x80'*9999 165 (+119%) 165 (+119%) 81 (+347%) 334 (+8%) 362
utf-32le '\u8000'+'\u0100'*9999 165 (+118%) 165 (+118%) 81 (+343%) 333 (+8%) 359
utf-32le '\U00010000'*10000 155 (+130%) 151 (+136%) 80 (+346%) 324 (+10%) 357
utf-32le '\U00010000'+'A'*9999 165 (+117%) 165 (+117%) 80 (+348%) 325 (+10%) 358
utf-32le '\U00010000'+'\x80'*9999 165 (+118%) 165 (+118%) 80 (+349%) 325 (+10%) 359
utf-32le '\U00010000'+'\u0100'*9999 165 (+116%) 165 (+116%) 80 (+346%) 324 (+10%) 357
utf-32le '\U00010000'+'\u8000'*9999 165 (+117%) 165 (+117%) 80 (+348%) 324 (+10%) 358
utf-32be 'A'*10000 165 (+172%) 165 (+172%) 100 (+348%) 390 (+15%) 448
utf-32be '\x80'*10000 165 (+159%) 165 (+159%) 75 (+469%) 373 (+14%) 427
utf-32be '\x80'+'A'*9999 165 (+160%) 165 (+160%) 75 (+472%) 375 (+14%) 429
utf-32be '\u0100'*10000 165 (+119%) 165 (+119%) 81 (+347%) 334 (+8%) 362
utf-32be '\u0100'+'A'*9999 165 (+120%) 165 (+120%) 81 (+348%) 335 (+8%) 363
utf-32be '\u0100'+'\x80'*9999 165 (+119%) 165 (+119%) 81 (+347%) 335 (+8%) 362
utf-32be '\u8000'*10000 165 (+119%) 165 (+119%) 81 (+347%) 334 (+8%) 362
utf-32be '\u8000'+'A'*9999 165 (+120%) 165 (+120%) 81 (+348%) 334 (+9%) 363
utf-32be '\u8000'+'\x80'*9999 165 (+119%) 165 (+119%) 81 (+347%) 335 (+8%) 362
utf-32be '\u8000'+'\u0100'*9999 165 (+118%) 165 (+118%) 81 (+344%) 335 (+7%) 360
utf-32be '\U00010000'*10000 155 (+130%) 151 (+136%) 80 (+346%) 324 (+10%) 357
utf-32be '\U00010000'+'A'*9999 165 (+117%) 165 (+117%) 80 (+348%) 325 (+10%) 358
utf-32be '\U00010000'+'\x80'*9999 165 (+118%) 165 (+118%) 80 (+349%) 325 (+10%) 359
utf-32be '\U00010000'+'\u0100'*9999 165 (+117%) 165 (+117%) 80 (+348%) 324 (+10%) 358
utf-32be '\U00010000'+'\u8000'*9999 165 (+117%) 165 (+117%) 80 (+348%) 325 (+10%) 358
For scripts see issue14624.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue14625>
_______________________________________
More information about the Python-bugs-list
mailing list