[issue14625] Faster utf-32 decoder

Serhiy Storchaka report at bugs.python.org
Fri May 11 21:24:55 CEST 2012


Serhiy Storchaka <storchaka at gmail.com> added the comment:

The patches updated to stylistic conformity of the UTF-8 decoder. Patch B is significantly accelerated for aligned input data (i. e. almost always), especially for natural order. The UTF-32 decoder can now be faster than ASCII decoder! May be it is time to change the title to "Amazingly faster UTF-32 decoding"? ;)

                                          Py3.2         Py3.3         patchA       patchB

utf-32le  'A'*10000                       162 (+462%)   100 (+810%)   391 (+133%)   910
utf-32le      'A'*9999+'\x80'             162 (+411%)   99 (+736%)    377 (+120%)   828
utf-32le      'A'*9999+'\u0100'           162 (+277%)   95 (+543%)    324 (+89%)    611
utf-32le      'A'*9999+'\u8000'           162 (+278%)   95 (+545%)    324 (+89%)    613
utf-32le      'A'*9999+'\U00010000'       162 (+280%)   95 (+547%)    322 (+91%)    615
utf-32le  '\x80'*10000                    162 (+436%)   94 (+823%)    389 (+123%)   868
utf-32le    '\x80'+'A'*9999               162 (+441%)   94 (+832%)    388 (+126%)   876
utf-32le      '\x80'*9999+'\u0100'        162 (+273%)   90 (+571%)    320 (+89%)    604
utf-32le      '\x80'*9999+'\u8000'        162 (+271%)   90 (+568%)    319 (+88%)    601
utf-32le      '\x80'*9999+'\U00010000'    162 (+268%)   90 (+562%)    318 (+87%)    596
utf-32le  '\u0100'*10000                  161 (+445%)   83 (+958%)    405 (+117%)   878
utf-32le    '\u0100'+'A'*9999             162 (+440%)   83 (+954%)    403 (+117%)   875
utf-32le    '\u0100'+'\x80'*9999          162 (+444%)   83 (+963%)    403 (+119%)   882
utf-32le      '\u0100'*9999+'\u8000'      162 (+441%)   83 (+955%)    404 (+117%)   876
utf-32le      '\u0100'*9999+'\U00010000'  162 (+259%)   79 (+637%)    325 (+79%)    582
utf-32le  '\u8000'*10000                  162 (+441%)   83 (+955%)    404 (+117%)   876
utf-32le    '\u8000'+'A'*9999             162 (+441%)   83 (+955%)    404 (+117%)   876
utf-32le    '\u8000'+'\x80'*9999          161 (+448%)   83 (+964%)    403 (+119%)   883
utf-32le    '\u8000'+'\u0100'*9999        161 (+443%)   83 (+954%)    402 (+118%)   875
utf-32le      '\u8000'*9999+'\U00010000'  162 (+262%)   79 (+643%)    325 (+81%)    587
utf-32le  '\U00010000'*10000              149 (+483%)   83 (+947%)    390 (+123%)   869
utf-32le    '\U00010000'+'A'*9999         162 (+444%)   83 (+963%)    389 (+127%)   882
utf-32le    '\U00010000'+'\x80'*9999      162 (+430%)   83 (+935%)    389 (+121%)   859
utf-32le    '\U00010000'+'\u0100'*9999    162 (+429%)   83 (+933%)    389 (+120%)   857
utf-32le    '\U00010000'+'\u8000'*9999    162 (+431%)   83 (+937%)    388 (+122%)   861

utf-32be  'A'*10000                       162 (+199%)   100 (+384%)   393 (+23%)    484
utf-32be      'A'*9999+'\x80'             162 (+186%)   99 (+368%)    376 (+23%)    463
utf-32be      'A'*9999+'\u0100'           162 (+138%)   95 (+306%)    323 (+20%)    386
utf-32be      'A'*9999+'\u8000'           162 (+139%)   95 (+307%)    323 (+20%)    387
utf-32be      'A'*9999+'\U00010000'       162 (+138%)   95 (+305%)    322 (+20%)    385
utf-32be  '\x80'*10000                    161 (+196%)   94 (+407%)    389 (+23%)    477
utf-32be    '\x80'+'A'*9999               161 (+197%)   94 (+409%)    387 (+24%)    478
utf-32be      '\x80'*9999+'\u0100'        161 (+137%)   90 (+324%)    321 (+19%)    382
utf-32be      '\x80'*9999+'\u8000'        162 (+135%)   89 (+328%)    320 (+19%)    381
utf-32be      '\x80'*9999+'\U00010000'    162 (+134%)   89 (+326%)    318 (+19%)    379
utf-32be  '\u0100'*10000                  161 (+196%)   83 (+473%)    404 (+18%)    476
utf-32be    '\u0100'+'A'*9999             161 (+196%)   83 (+475%)    402 (+19%)    477
utf-32be    '\u0100'+'\x80'*9999          162 (+196%)   83 (+477%)    403 (+19%)    479
utf-32be      '\u0100'*9999+'\u8000'      161 (+196%)   83 (+473%)    404 (+18%)    476
utf-32be      '\u0100'*9999+'\U00010000'  162 (+131%)   79 (+373%)    325 (+15%)    374
utf-32be  '\u8000'*10000                  161 (+195%)   83 (+472%)    404 (+18%)    475
utf-32be    '\u8000'+'A'*9999             161 (+197%)   83 (+476%)    402 (+19%)    478
utf-32be    '\u8000'+'\x80'*9999          161 (+197%)   83 (+476%)    403 (+19%)    478
utf-32be    '\u8000'+'\u0100'*9999        162 (+194%)   83 (+473%)    403 (+18%)    476
utf-32be      '\u8000'*9999+'\U00010000'  161 (+133%)   79 (+375%)    325 (+15%)    375
utf-32be  '\U00010000'*10000              148 (+222%)   83 (+473%)    391 (+22%)    476
utf-32be    '\U00010000'+'A'*9999         161 (+198%)   83 (+477%)    389 (+23%)    479
utf-32be    '\U00010000'+'\x80'*9999      162 (+194%)   83 (+473%)    389 (+22%)    476
utf-32be    '\U00010000'+'\u0100'*9999    162 (+194%)   83 (+475%)    389 (+23%)    477
utf-32be    '\U00010000'+'\u8000'*9999    161 (+196%)   83 (+475%)    389 (+23%)    477

----------
Added file: http://bugs.python.org/file25537/decode_utf32_a_2.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue14625>
_______________________________________


More information about the Python-bugs-list mailing list