Issue 14625: Faster utf-32 decoder (original) (raw)

Issue14625

Created on 2012-04-19 20:59 by serhiy.storchaka, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
decode_utf32_a_3.patch serhiy.storchaka,2012-10-20 19:10 review
decode_utf32_b_3.patch serhiy.storchaka,2012-10-20 19:11 review
Messages (14)
msg158749 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-04-19 20:59
I suggest two variants of patch, accelerating the utf-32 decoder. With PEP 393 utf-32 decoder slowed down up to 2x, these patches returns a performance at the level of Python 3.2 and even much higher (2-3x over 3.2). The variant A is simpler, but the variant B is a little faster (+8-15%).
msg158750 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-04-19 21:03
See also #14624 for UTF-16 decoder.
msg159078 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-04-23 21:01
Here are the results of benchmarking (numbers in MB/s). On 32-bit Linux, AMD Athlon 64 X2 4600+ @ 2.4GHz: Py2.7 Py3.2 Py3.3 patchA patchB utf-32le 'A'*10000 461 (+215%) 454 (+220%) 292 (+398%) 1213 (+20%) 1454 utf-32le '\x80'*10000 458 (+177%) 454 (+180%) 271 (+369%) 1124 (+13%) 1270 utf-32le '\x80'+'A'*9999 462 (+171%) 454 (+175%) 271 (+361%) 1111 (+13%) 1250 utf-32le '\u0100'*10000 457 (+133%) 454 (+135%) 220 (+385%) 1002 (+6%) 1067 utf-32le '\u0100'+'A'*9999 461 (+129%) 454 (+132%) 220 (+379%) 993 (+6%) 1054 utf-32le '\u0100'+'\x80'*9999 458 (+131%) 454 (+133%) 220 (+381%) 1002 (+6%) 1059 utf-32le '\u8000'*10000 457 (+133%) 454 (+135%) 220 (+385%) 1002 (+6%) 1066 utf-32le '\u8000'+'A'*9999 462 (+128%) 454 (+132%) 220 (+379%) 994 (+6%) 1053 utf-32le '\u8000'+'\x80'*9999 457 (+132%) 454 (+134%) 220 (+382%) 1000 (+6%) 1061 utf-32le '\u8000'+'\u0100'*9999 457 (+132%) 454 (+134%) 220 (+382%) 1002 (+6%) 1061 utf-32le '\U00010000'*10000 386 (+167%) 416 (+148%) 212 (+386%) 930 (+11%) 1031 utf-32le '\U00010000'+'A'*9999 461 (+126%) 415 (+152%) 222 (+370%) 940 (+11%) 1044 utf-32le '\U00010000'+'\x80'*9999 458 (+125%) 415 (+148%) 222 (+364%) 930 (+11%) 1031 utf-32le '\U00010000'+'\u0100'*9999 458 (+125%) 415 (+148%) 212 (+386%) 930 (+11%) 1031 utf-32le '\U00010000'+'\u8000'*9999 458 (+125%) 415 (+149%) 222 (+365%) 930 (+11%) 1032 utf-32be 'A'*10000 461 (+216%) 454 (+221%) 292 (+399%) 1209 (+20%) 1456 utf-32be '\x80'*10000 457 (+177%) 454 (+179%) 271 (+368%) 1125 (+13%) 1268 utf-32be '\x80'+'A'*9999 462 (+171%) 453 (+176%) 271 (+362%) 1112 (+12%) 1251 utf-32be '\u0100'*10000 457 (+144%) 453 (+146%) 220 (+407%) 1048 (+6%) 1116 utf-32be '\u0100'+'A'*9999 462 (+139%) 454 (+143%) 220 (+402%) 1034 (+7%) 1104 utf-32be '\u0100'+'\x80'*9999 459 (+142%) 453 (+145%) 220 (+405%) 1047 (+6%) 1112 utf-32be '\u8000'*10000 457 (+144%) 453 (+147%) 220 (+408%) 1046 (+7%) 1117 utf-32be '\u8000'+'A'*9999 462 (+139%) 454 (+143%) 220 (+402%) 1034 (+7%) 1104 utf-32be '\u8000'+'\x80'*9999 459 (+142%) 453 (+145%) 220 (+405%) 1045 (+6%) 1112 utf-32be '\u8000'+'\u0100'*9999 459 (+142%) 454 (+144%) 220 (+404%) 1047 (+6%) 1109 utf-32be '\U00010000'*10000 386 (+155%) 416 (+137%) 212 (+364%) 940 (+5%) 984 utf-32be '\U00010000'+'A'*9999 461 (+116%) 415 (+140%) 213 (+367%) 948 (+5%) 994 utf-32be '\U00010000'+'\x80'*9999 458 (+115%) 415 (+137%) 222 (+343%) 938 (+5%) 983 utf-32be '\U00010000'+'\u0100'*9999 458 (+115%) 415 (+137%) 212 (+364%) 940 (+5%) 983 utf-32be '\U00010000'+'\u8000'*9999 458 (+115%) 415 (+137%) 222 (+343%) 939 (+5%) 983 On 32-bit Linux, Intel Atom N570 @ 1.66GHz: Py2.7 Py3.2 Py3.3 patchA patchB utf-32le 'A'*10000 165 (+173%) 165 (+173%) 100 (+350%) 389 (+16%) 450 utf-32le '\x80'*10000 165 (+159%) 165 (+159%) 76 (+462%) 374 (+14%) 427 utf-32le '\x80'+'A'*9999 165 (+161%) 165 (+161%) 76 (+466%) 374 (+15%) 430 utf-32le '\u0100'*10000 165 (+119%) 165 (+119%) 81 (+346%) 333 (+8%) 361 utf-32le '\u0100'+'A'*9999 165 (+120%) 165 (+120%) 81 (+348%) 334 (+9%) 363 utf-32le '\u0100'+'\x80'*9999 165 (+119%) 165 (+119%) 81 (+347%) 334 (+8%) 362 utf-32le '\u8000'*10000 165 (+119%) 165 (+119%) 80 (+352%) 333 (+9%) 362 utf-32le '\u8000'+'A'*9999 165 (+119%) 165 (+119%) 81 (+347%) 334 (+8%) 362 utf-32le '\u8000'+'\x80'*9999 165 (+119%) 165 (+119%) 81 (+347%) 334 (+8%) 362 utf-32le '\u8000'+'\u0100'*9999 165 (+118%) 165 (+118%) 81 (+343%) 333 (+8%) 359 utf-32le '\U00010000'*10000 155 (+130%) 151 (+136%) 80 (+346%) 324 (+10%) 357 utf-32le '\U00010000'+'A'*9999 165 (+117%) 165 (+117%) 80 (+348%) 325 (+10%) 358 utf-32le '\U00010000'+'\x80'*9999 165 (+118%) 165 (+118%) 80 (+349%) 325 (+10%) 359 utf-32le '\U00010000'+'\u0100'*9999 165 (+116%) 165 (+116%) 80 (+346%) 324 (+10%) 357 utf-32le '\U00010000'+'\u8000'*9999 165 (+117%) 165 (+117%) 80 (+348%) 324 (+10%) 358 utf-32be 'A'*10000 165 (+172%) 165 (+172%) 100 (+348%) 390 (+15%) 448 utf-32be '\x80'*10000 165 (+159%) 165 (+159%) 75 (+469%) 373 (+14%) 427 utf-32be '\x80'+'A'*9999 165 (+160%) 165 (+160%) 75 (+472%) 375 (+14%) 429 utf-32be '\u0100'*10000 165 (+119%) 165 (+119%) 81 (+347%) 334 (+8%) 362 utf-32be '\u0100'+'A'*9999 165 (+120%) 165 (+120%) 81 (+348%) 335 (+8%) 363 utf-32be '\u0100'+'\x80'*9999 165 (+119%) 165 (+119%) 81 (+347%) 335 (+8%) 362 utf-32be '\u8000'*10000 165 (+119%) 165 (+119%) 81 (+347%) 334 (+8%) 362 utf-32be '\u8000'+'A'*9999 165 (+120%) 165 (+120%) 81 (+348%) 334 (+9%) 363 utf-32be '\u8000'+'\x80'*9999 165 (+119%) 165 (+119%) 81 (+347%) 335 (+8%) 362 utf-32be '\u8000'+'\u0100'*9999 165 (+118%) 165 (+118%) 81 (+344%) 335 (+7%) 360 utf-32be '\U00010000'*10000 155 (+130%) 151 (+136%) 80 (+346%) 324 (+10%) 357 utf-32be '\U00010000'+'A'*9999 165 (+117%) 165 (+117%) 80 (+348%) 325 (+10%) 358 utf-32be '\U00010000'+'\x80'*9999 165 (+118%) 165 (+118%) 80 (+349%) 325 (+10%) 359 utf-32be '\U00010000'+'\u0100'*9999 165 (+117%) 165 (+117%) 80 (+348%) 324 (+10%) 358 utf-32be '\U00010000'+'\u8000'*9999 165 (+117%) 165 (+117%) 80 (+348%) 325 (+10%) 358 For scripts see .
msg160443 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-05-11 19:24
The patches updated to stylistic conformity of the UTF-8 decoder. Patch B is significantly accelerated for aligned input data (i. e. almost always), especially for natural order. The UTF-32 decoder can now be faster than ASCII decoder! May be it is time to change the title to "Amazingly faster UTF-32 decoding"? ;) Py3.2 Py3.3 patchA patchB utf-32le 'A'*10000 162 (+462%) 100 (+810%) 391 (+133%) 910 utf-32le 'A'*9999+'\x80' 162 (+411%) 99 (+736%) 377 (+120%) 828 utf-32le 'A'*9999+'\u0100' 162 (+277%) 95 (+543%) 324 (+89%) 611 utf-32le 'A'*9999+'\u8000' 162 (+278%) 95 (+545%) 324 (+89%) 613 utf-32le 'A'*9999+'\U00010000' 162 (+280%) 95 (+547%) 322 (+91%) 615 utf-32le '\x80'*10000 162 (+436%) 94 (+823%) 389 (+123%) 868 utf-32le '\x80'+'A'*9999 162 (+441%) 94 (+832%) 388 (+126%) 876 utf-32le '\x80'*9999+'\u0100' 162 (+273%) 90 (+571%) 320 (+89%) 604 utf-32le '\x80'*9999+'\u8000' 162 (+271%) 90 (+568%) 319 (+88%) 601 utf-32le '\x80'*9999+'\U00010000' 162 (+268%) 90 (+562%) 318 (+87%) 596 utf-32le '\u0100'*10000 161 (+445%) 83 (+958%) 405 (+117%) 878 utf-32le '\u0100'+'A'*9999 162 (+440%) 83 (+954%) 403 (+117%) 875 utf-32le '\u0100'+'\x80'*9999 162 (+444%) 83 (+963%) 403 (+119%) 882 utf-32le '\u0100'*9999+'\u8000' 162 (+441%) 83 (+955%) 404 (+117%) 876 utf-32le '\u0100'*9999+'\U00010000' 162 (+259%) 79 (+637%) 325 (+79%) 582 utf-32le '\u8000'*10000 162 (+441%) 83 (+955%) 404 (+117%) 876 utf-32le '\u8000'+'A'*9999 162 (+441%) 83 (+955%) 404 (+117%) 876 utf-32le '\u8000'+'\x80'*9999 161 (+448%) 83 (+964%) 403 (+119%) 883 utf-32le '\u8000'+'\u0100'*9999 161 (+443%) 83 (+954%) 402 (+118%) 875 utf-32le '\u8000'*9999+'\U00010000' 162 (+262%) 79 (+643%) 325 (+81%) 587 utf-32le '\U00010000'*10000 149 (+483%) 83 (+947%) 390 (+123%) 869 utf-32le '\U00010000'+'A'*9999 162 (+444%) 83 (+963%) 389 (+127%) 882 utf-32le '\U00010000'+'\x80'*9999 162 (+430%) 83 (+935%) 389 (+121%) 859 utf-32le '\U00010000'+'\u0100'*9999 162 (+429%) 83 (+933%) 389 (+120%) 857 utf-32le '\U00010000'+'\u8000'*9999 162 (+431%) 83 (+937%) 388 (+122%) 861 utf-32be 'A'*10000 162 (+199%) 100 (+384%) 393 (+23%) 484 utf-32be 'A'*9999+'\x80' 162 (+186%) 99 (+368%) 376 (+23%) 463 utf-32be 'A'*9999+'\u0100' 162 (+138%) 95 (+306%) 323 (+20%) 386 utf-32be 'A'*9999+'\u8000' 162 (+139%) 95 (+307%) 323 (+20%) 387 utf-32be 'A'*9999+'\U00010000' 162 (+138%) 95 (+305%) 322 (+20%) 385 utf-32be '\x80'*10000 161 (+196%) 94 (+407%) 389 (+23%) 477 utf-32be '\x80'+'A'*9999 161 (+197%) 94 (+409%) 387 (+24%) 478 utf-32be '\x80'*9999+'\u0100' 161 (+137%) 90 (+324%) 321 (+19%) 382 utf-32be '\x80'*9999+'\u8000' 162 (+135%) 89 (+328%) 320 (+19%) 381 utf-32be '\x80'*9999+'\U00010000' 162 (+134%) 89 (+326%) 318 (+19%) 379 utf-32be '\u0100'*10000 161 (+196%) 83 (+473%) 404 (+18%) 476 utf-32be '\u0100'+'A'*9999 161 (+196%) 83 (+475%) 402 (+19%) 477 utf-32be '\u0100'+'\x80'*9999 162 (+196%) 83 (+477%) 403 (+19%) 479 utf-32be '\u0100'*9999+'\u8000' 161 (+196%) 83 (+473%) 404 (+18%) 476 utf-32be '\u0100'*9999+'\U00010000' 162 (+131%) 79 (+373%) 325 (+15%) 374 utf-32be '\u8000'*10000 161 (+195%) 83 (+472%) 404 (+18%) 475 utf-32be '\u8000'+'A'*9999 161 (+197%) 83 (+476%) 402 (+19%) 478 utf-32be '\u8000'+'\x80'*9999 161 (+197%) 83 (+476%) 403 (+19%) 478 utf-32be '\u8000'+'\u0100'*9999 162 (+194%) 83 (+473%) 403 (+18%) 476 utf-32be '\u8000'*9999+'\U00010000' 161 (+133%) 79 (+375%) 325 (+15%) 375 utf-32be '\U00010000'*10000 148 (+222%) 83 (+473%) 391 (+22%) 476 utf-32be '\U00010000'+'A'*9999 161 (+198%) 83 (+477%) 389 (+23%) 479 utf-32be '\U00010000'+'\x80'*9999 162 (+194%) 83 (+473%) 389 (+22%) 476 utf-32be '\U00010000'+'\u0100'*9999 162 (+194%) 83 (+475%) 389 (+23%) 477 utf-32be '\U00010000'+'\u8000'*9999 161 (+196%) 83 (+475%) 389 (+23%) 477
msg173405 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-20 19:10
Patches updated to 3.4.
msg173406 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-20 19:23
I suggest apply patch A to 3.3 as it fixes performance regression (2x) and is very simple.
msg173407 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-10-20 19:38
Very simple? You're changing most of the code there.
msg173408 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-20 19:59
It was too complicated code. Actually patched code is smaller. 1 file changed, 71 insertions(+), 80 deletions(-) UTF-16 codec was modified in some way.
msg173409 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-10-20 20:21
That the new code is smaller is no guarantee that it's as correct :) That is exactly the reason we don't put optimizations in bugfix releases.
msg174170 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-10-30 00:59
> I suggest apply patch A to 3.3 as it fixes performance > regression (2x) and is very simple. ASCII and UTF-8 are the two most common codecs in the world, so it's justified to have heavily optimized encoders and decoders. I don't know any application using UTF-32-LE or UTF-32-BE. So I don't want to waste Python memory/code size with a heavily optimized decoder. The patch A looks to be enough. -- 32 bit units is commonly used with wchar_t, but this format already has a fast decoder, PyUnicode_FromWideChar(), which uses memcpy() or _PyUnicode_CONVERT_BYTES().
msg174188 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-30 09:29
> I don't know any application using UTF-32-LE or UTF-32-BE. So I don't want to waste Python memory/code size with a heavily optimized decoder. The patch A looks to be enough. Agree. I had the same doubts. That's why I proposed two patches for your choice.
msg174233 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012-10-30 22:24
New changeset 9badfe3a31a7 by Victor Stinner in branch 'default': Close #14625: Rewrite the UTF-32 decoder. It is now 3x to 4x faster http://hg.python.org/cpython/rev/9badfe3a31a7
msg174234 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-10-30 22:26
I applied the patch "A" with minor changes: replace multiple goto with classic break/continue and if/else.
msg174239 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-30 23:27
> I applied the patch "A" with minor changes: replace multiple goto with > classic break/continue and if/else. Looks good. Thanks.
History
Date User Action Args
2022-04-11 14:57:29 admin set github: 58830
2012-10-30 23:27:45 serhiy.storchaka set messages: +
2012-10-30 22:26:24 vstinner set messages: +
2012-10-30 22:24:09 python-dev set status: open -> closednosy: + python-devmessages: + resolution: fixedstage: patch review -> resolved
2012-10-30 09:29:56 serhiy.storchaka set messages: +
2012-10-30 00:59:28 vstinner set messages: +
2012-10-24 09:02:12 serhiy.storchaka set stage: patch review
2012-10-20 20:21:48 georg.brandl set messages: +
2012-10-20 19:59:49 serhiy.storchaka set messages: +
2012-10-20 19:38:13 georg.brandl set messages: +
2012-10-20 19:25:53 serhiy.storchaka set nosy: + georg.brandl
2012-10-20 19:23:40 serhiy.storchaka set messages: + versions: + Python 3.3
2012-10-20 19:11:26 serhiy.storchaka set keywords: + 3.3regressionfiles: + decode_utf32_b_3.patch
2012-10-20 19:10:07 serhiy.storchaka set keywords: + needs reviewfiles: + decode_utf32_a_3.patchmessages: + versions: + Python 3.4, - Python 3.3
2012-10-20 19:07:24 serhiy.storchaka set files: - decode_utf32_b_2.patch
2012-10-20 19:06:51 serhiy.storchaka set files: - decode_utf32_a_2.patch
2012-10-20 19:06:34 serhiy.storchaka set files: - decode_utf32_b.patch
2012-10-20 19:06:21 serhiy.storchaka set files: - decode_utf32_a.patch
2012-05-11 19:46:42 serhiy.storchaka set nosy: + ezio.melotticomponents: + Unicode
2012-05-11 19:25:28 serhiy.storchaka set files: + decode_utf32_b_2.patch
2012-05-11 19:24:54 serhiy.storchaka set files: + decode_utf32_a_2.patchmessages: +
2012-04-23 21:01:40 serhiy.storchaka set messages: +
2012-04-20 21:38:12 asvetlov set nosy: + asvetlov
2012-04-20 06:36:35 Arfrever set nosy: + Arfrever
2012-04-19 21:03:20 vstinner set nosy: + pitrou, vstinnermessages: +
2012-04-19 21:00:28 serhiy.storchaka set files: + decode_utf32_b.patch
2012-04-19 20:59:56 serhiy.storchaka create