msg158748 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-04-19 20:58 |
I propose a patch, which accelerates the utf-16 decoder. With PEP 393 utf-16 decoder slowed down a few times (3-4x), this patch returns the performance at the level of Python 3.2 and even higher (+10-30% over 3.2). In addition, it fixes a few bugs in the utf-16 decoder. Also as a side effect is possible acceleration of other decoders. |
|
|
msg158751 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2012-04-19 21:03 |
See also #14625 for UTF-32 decoder. |
|
|
msg158753 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-04-19 21:09 |
See also issue #14579 for utf-16 decoder bugs. |
|
|
msg158772 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2012-04-19 23:08 |
Serhiy: can you please submit a contributor form? |
|
|
msg159077 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-04-23 21:01 |
Here are the results of benchmarking (numbers in MB/s). On 32-bit Linux, AMD Athlon 64 X2 4600+ @ 2.4GHz: Py2.7 Py3.2 Py3.3 patch utf-16le 'A'*10000 504 (+282%) 1905 (+1%) 565 (+241%) 1927 utf-16le '\x80'*10000 503 (+264%) 1894 (-3%) 417 (+340%) 1833 utf-16le '\x80'+'A'*9999 504 (+264%) 1890 (-3%) 422 (+335%) 1834 utf-16le '\u0100'*10000 503 (+249%) 1896 (-7%) 357 (+391%) 1754 utf-16le '\u0100'+'A'*9999 504 (+252%) 1896 (-6%) 360 (+393%) 1776 utf-16le '\u0100'+'\x80'*9999 503 (+249%) 1890 (-7%) 357 (+392%) 1756 utf-16le '\u8000'*10000 503 (-18%) 355 (+16%) 75 (+449%) 412 utf-16le '\u8000'+'A'*9999 504 (+254%) 1892 (-6%) 359 (+397%) 1783 utf-16le '\u8000'+'\x80'*9999 503 (+249%) 1896 (-7%) 357 (+392%) 1755 utf-16le '\u8000'+'\u0100'*9999 503 (+258%) 1901 (-5%) 359 (+402%) 1802 utf-16le '\U00010000'*10000 484 (-14%) 379 (+9%) 103 (+303%) 415 utf-16le '\U00010000'+'A'*9999 504 (+244%) 1905 (-9%) 353 (+392%) 1735 utf-16le '\U00010000'+'\x80'*9999 503 (+245%) 1899 (-9%) 348 (+398%) 1733 utf-16le '\U00010000'+'\u0100'*9999 503 (+244%) 1882 (-8%) 348 (+397%) 1729 utf-16le '\U00010000'+'\u8000'*9999 503 (-18%) 355 (+16%) 71 (+482%) 413 utf-16be 'A'*10000 504 (+284%) 1553 (+24%) 469 (+312%) 1933 utf-16be '\x80'*10000 504 (+251%) 1551 (+14%) 387 (+357%) 1770 utf-16be '\x80'+'A'*9999 504 (+261%) 1549 (+17%) 386 (+371%) 1819 utf-16be '\u0100'*10000 503 (+175%) 1544 (-10%) 333 (+316%) 1384 utf-16be '\u0100'+'A'*9999 505 (+178%) 1548 (-9%) 335 (+319%) 1403 utf-16be '\u0100'+'\x80'*9999 503 (+179%) 1552 (-9%) 336 (+318%) 1405 utf-16be '\u8000'*10000 503 (-2%) 415 (+19%) 75 (+559%) 494 utf-16be '\u8000'+'A'*9999 504 (+179%) 1551 (-9%) 335 (+320%) 1408 utf-16be '\u8000'+'\x80'*9999 504 (+178%) 1551 (-10%) 336 (+317%) 1402 utf-16be '\u8000'+'\u0100'*9999 504 (+179%) 1549 (-9%) 336 (+318%) 1404 utf-16be '\U00010000'*10000 483 (-7%) 407 (+10%) 105 (+326%) 447 utf-16be '\U00010000'+'A'*9999 504 (+149%) 1554 (-19%) 317 (+295%) 1253 utf-16be '\U00010000'+'\x80'*9999 503 (+153%) 1543 (-17%) 317 (+302%) 1275 utf-16be '\U00010000'+'\u0100'*9999 503 (+153%) 1537 (-17%) 317 (+302%) 1274 utf-16be '\U00010000'+'\u8000'*9999 503 (-2%) 415 (+19%) 71 (+597%) 495 On 32-bit Linux, Intel Atom N570 @ 1.66GHz: Py2.7 Py3.2 Py3.3 patch utf-16le 'A'*10000 136 (+417%) 584 (+20%) 184 (+282%) 703 utf-16le '\x80'*10000 136 (+392%) 580 (+15%) 160 (+318%) 669 utf-16le '\x80'+'A'*9999 136 (+398%) 582 (+16%) 159 (+326%) 677 utf-16le '\u0100'*10000 137 (+346%) 583 (+5%) 129 (+374%) 611 utf-16le '\u0100'+'A'*9999 136 (+358%) 582 (+7%) 129 (+383%) 623 utf-16le '\u0100'+'\x80'*9999 136 (+348%) 580 (+5%) 129 (+372%) 609 utf-16le '\u8000'*10000 136 (+18%) 127 (+27%) 38 (+324%) 161 utf-16le '\u8000'+'A'*9999 136 (+357%) 582 (+7%) 129 (+382%) 622 utf-16le '\u8000'+'\x80'*9999 136 (+351%) 581 (+6%) 128 (+380%) 614 utf-16le '\u8000'+'\u0100'*9999 136 (+349%) 581 (+5%) 129 (+374%) 611 utf-16le '\U00010000'*10000 153 (-3%) 140 (+6%) 53 (+181%) 149 utf-16le '\U00010000'+'A'*9999 136 (+296%) 581 (-7%) 131 (+311%) 538 utf-16le '\U00010000'+'\x80'*9999 136 (+289%) 584 (-9%) 131 (+304%) 529 utf-16le '\U00010000'+'\u0100'*9999 136 (+290%) 579 (-8%) 130 (+308%) 530 utf-16le '\U00010000'+'\u8000'*9999 136 (+25%) 128 (+33%) 38 (+347%) 170 utf-16be 'A'*10000 136 (+331%) 441 (+33%) 166 (+253%) 586 utf-16be '\x80'*10000 136 (+309%) 440 (+26%) 145 (+283%) 556 utf-16be '\x80'+'A'*9999 136 (+312%) 442 (+27%) 145 (+286%) 560 utf-16be '\u0100'*10000 136 (+231%) 441 (+2%) 120 (+275%) 450 utf-16be '\u0100'+'A'*9999 136 (+232%) 442 (+2%) 120 (+276%) 451 utf-16be '\u0100'+'\x80'*9999 136 (+231%) 438 (+3%) 119 (+278%) 450 utf-16be '\u8000'*10000 136 (+22%) 127 (+31%) 38 (+337%) 166 utf-16be '\u8000'+'A'*9999 136 (+232%) 439 (+3%) 120 (+276%) 451 utf-16be '\u8000'+'\x80'*9999 136 (+230%) 439 (+2%) 120 (+274%) 449 utf-16be '\u8000'+'\u0100'*9999 136 (+232%) 439 (+3%) 120 (+276%) 451 utf-16be '\U00010000'*10000 153 (-1%) 139 (+9%) 52 (+192%) 152 utf-16be '\U00010000'+'A'*9999 136 (+211%) 440 (-4%) 121 (+250%) 423 utf-16be '\U00010000'+'\x80'*9999 136 (+210%) 440 (-4%) 122 (+246%) 422 utf-16be '\U00010000'+'\u0100'*9999 136 (+210%) 441 (-5%) 121 (+248%) 421 utf-16be '\U00010000'+'\u8000'*9999 136 (+27%) 128 (+35%) 38 (+355%) 173 |
|
|
msg159090 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-04-23 21:57 |
64 bit Linux, Intel Core i5-2500K @ 3.30GHz: vanilla 3.3 patched utf-16le 'A'*10000 1384 (+278%) 5233 utf-16le 'A'*9999+'\x80' 1303 (+259%) 4684 utf-16le 'A'*9999+'\u0100' 953 (+195%) 2813 utf-16le 'A'*9999+'\u8000' 953 (+195%) 2814 utf-16le 'A'*9999+'\U00010000' 979 (+197%) 2903 utf-16le '\x80'*10000 1243 (+321%) 5230 utf-16le '\x80'+'A'*9999 1256 (+313%) 5188 utf-16le '\x80'*9999+'\u0100' 880 (+214%) 2765 utf-16le '\x80'*9999+'\u8000' 880 (+214%) 2763 utf-16le '\x80'*9999+'\U00010000' 899 (+218%) 2860 utf-16le '\u0100'*10000 1047 (+370%) 4917 utf-16le '\u0100'+'A'*9999 1046 (+369%) 4906 utf-16le '\u0100'+'\x80'*9999 1047 (+370%) 4920 utf-16le '\u0100'*9999+'\u8000' 1047 (+369%) 4906 utf-16le '\u0100'*9999+'\U00010000' 791 (+253%) 2793 utf-16le '\u8000'*10000 230 (+410%) 1173 utf-16le '\u8000'+'A'*9999 1043 (+371%) 4911 utf-16le '\u8000'+'\x80'*9999 1044 (+345%) 4645 utf-16le '\u8000'+'\u0100'*9999 1041 (+350%) 4681 utf-16le '\u8000'*9999+'\U00010000' 215 (+357%) 983 utf-16le '\U00010000'*10000 362 (+170%) 976 utf-16le '\U00010000'+'A'*9999 985 (+210%) 3052 utf-16le '\U00010000'+'\x80'*9999 985 (+211%) 3066 utf-16le '\U00010000'+'\u0100'*9999 983 (+209%) 3042 utf-16le '\U00010000'+'\u8000'*9999 245 (+329%) 1052 utf-16be 'A'*10000 1268 (+313%) 5240 utf-16be 'A'*9999+'\x80' 1199 (+297%) 4758 utf-16be 'A'*9999+'\u0100' 896 (+211%) 2786 utf-16be 'A'*9999+'\u8000' 897 (+211%) 2788 utf-16be 'A'*9999+'\U00010000' 919 (+214%) 2885 utf-16be '\x80'*10000 1154 (+341%) 5087 utf-16be '\x80'+'A'*9999 1155 (+343%) 5112 utf-16be '\x80'*9999+'\u0100' 829 (+229%) 2728 utf-16be '\x80'*9999+'\u8000' 828 (+229%) 2726 utf-16be '\x80'*9999+'\U00010000' 852 (+232%) 2832 utf-16be '\u0100'*10000 981 (+332%) 4241 utf-16be '\u0100'+'A'*9999 981 (+330%) 4220 utf-16be '\u0100'+'\x80'*9999 977 (+331%) 4213 utf-16be '\u0100'*9999+'\u8000' 982 (+331%) 4237 utf-16be '\u0100'*9999+'\U00010000' 748 (+237%) 2520 utf-16be '\u8000'*10000 230 (+413%) 1180 utf-16be '\u8000'+'A'*9999 979 (+331%) 4218 utf-16be '\u8000'+'\x80'*9999 974 (+333%) 4215 utf-16be '\u8000'+'\u0100'*9999 972 (+335%) 4226 utf-16be '\u8000'*9999+'\U00010000' 215 (+361%) 992 utf-16be '\U00010000'*10000 362 (+170%) 978 utf-16be '\U00010000'+'A'*9999 924 (+232%) 3064 utf-16be '\U00010000'+'\x80'*9999 921 (+223%) 2979 utf-16be '\U00010000'+'\u0100'*9999 921 (+233%) 3064 utf-16be '\U00010000'+'\u8000'*9999 245 (+329%) 1052 |
|
|
msg159847 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2012-05-03 10:34 |
New changeset 830eeff4fe8f by Victor Stinner in branch 'default': Issue #14624, #14687: Optimize unicode_widen() http://hg.python.org/cpython/rev/830eeff4fe8f |
|
|
msg159858 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-05-03 13:21 |
Here is updated patch, taking into account that unicode_widen is already optimized. |
|
|
msg160442 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-05-11 19:24 |
The patch updated to stylistic conformity of the UTF-8 decoder. The decoding of the UCS2 non-surrogate characters a little speed up (+15%). |
|
|
msg160572 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-05-13 20:38 |
New performance figures under 64 bit Linux, Intel Core i5-2500K @ 3.30GHz: vanilla 3.3 patched utf-16le 'A'*10000 1411 (+290%) 5504 utf-16le 'A'*9999+'\x80' 1368 (+263%) 4970 utf-16le 'A'*9999+'\u0100' 1145 (+151%) 2871 utf-16le 'A'*9999+'\u8000' 1144 (+151%) 2870 utf-16le 'A'*9999+'\U00010000' 1164 (+154%) 2957 utf-16le '\x80'*10000 1403 (+271%) 5209 utf-16le '\x80'+'A'*9999 1406 (+272%) 5235 utf-16le '\x80'*9999+'\u0100' 1138 (+138%) 2713 utf-16le '\x80'*9999+'\u8000' 1138 (+139%) 2716 utf-16le '\x80'*9999+'\U00010000' 1155 (+151%) 2897 utf-16le '\u0100'*10000 1477 (+243%) 5062 utf-16le '\u0100'+'A'*9999 1478 (+243%) 5072 utf-16le '\u0100'+'\x80'*9999 1477 (+243%) 5062 utf-16le '\u0100'*9999+'\u8000' 1478 (+242%) 5055 utf-16le '\u0100'*9999+'\U00010000' 1201 (+131%) 2776 utf-16le '\u8000'*10000 246 (+347%) 1100 utf-16le '\u8000'+'A'*9999 1475 (+244%) 5069 utf-16le '\u8000'+'\x80'*9999 1474 (+243%) 5062 utf-16le '\u8000'+'\u0100'*9999 1473 (+243%) 5057 utf-16le '\u8000'*9999+'\U00010000' 236 (+295%) 932 utf-16le '\U00010000'*10000 393 (+164%) 1039 utf-16le '\U00010000'+'A'*9999 1325 (+134%) 3106 utf-16le '\U00010000'+'\x80'*9999 1326 (+134%) 3103 utf-16le '\U00010000'+'\u0100'*9999 1326 (+134%) 3104 utf-16le '\U00010000'+'\u8000'*9999 253 (+331%) 1091 utf-16be 'A'*10000 1341 (+298%) 5342 utf-16be 'A'*9999+'\x80' 1305 (+275%) 4888 utf-16be 'A'*9999+'\u0100' 1101 (+157%) 2834 utf-16be 'A'*9999+'\u8000' 1102 (+157%) 2831 utf-16be 'A'*9999+'\U00010000' 1115 (+162%) 2917 utf-16be '\x80'*10000 1326 (+296%) 5253 utf-16be '\x80'+'A'*9999 1322 (+298%) 5258 utf-16be '\x80'*9999+'\u0100' 1088 (+156%) 2781 utf-16be '\x80'*9999+'\u8000' 1088 (+155%) 2770 utf-16be '\x80'*9999+'\U00010000' 1103 (+159%) 2854 utf-16be '\u0100'*10000 1344 (+221%) 4308 utf-16be '\u0100'+'A'*9999 1342 (+223%) 4330 utf-16be '\u0100'+'\x80'*9999 1343 (+221%) 4307 utf-16be '\u0100'*9999+'\u8000' 1343 (+221%) 4306 utf-16be '\u0100'*9999+'\U00010000' 1109 (+128%) 2529 utf-16be '\u8000'*10000 248 (+341%) 1094 utf-16be '\u8000'+'A'*9999 1340 (+223%) 4331 utf-16be '\u8000'+'\x80'*9999 1341 (+221%) 4307 utf-16be '\u8000'+'\u0100'*9999 1341 (+221%) 4309 utf-16be '\u8000'*9999+'\U00010000' 239 (+290%) 931 utf-16be '\U00010000'*10000 399 (+160%) 1037 utf-16be '\U00010000'+'A'*9999 1230 (+152%) 3101 utf-16be '\U00010000'+'\x80'*9999 1218 (+154%) 3095 utf-16be '\U00010000'+'\u0100'*9999 1220 (+154%) 3095 utf-16be '\U00010000'+'\u8000'*9999 257 (+318%) 1074 |
|
|
msg160672 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-05-14 22:14 |
The patch updated with a little clarified code and added comments. |
|
|
msg160766 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-05-15 21:29 |
Here are two new patch. Checking for characters out-of-range moved, making the code simpler. Theoretically it is a bit slow down decoding of short UCS1 strings (up to 1 and 3 chars on 32- and 64-bit), but practically there is no difference. The second patch is different from the first patch that masks are not calculated and specified explicitly. I am not sure that it improves readability. The commiter has the choice. |
|
|
msg160768 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2012-05-15 21:50 |
New changeset cdcc816dea85 by Antoine Pitrou in branch 'default': Issue #14624: UTF-16 decoding is now 3x to 4x faster on various inputs. http://hg.python.org/cpython/rev/cdcc816dea85 |
|
|
msg160769 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-05-15 21:52 |
Thank you Serhiy! Now committed. |
|
|
msg161100 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-05-19 09:02 |
Thank you, Antoine. Now only waits for review. > changeset: 77012:3430d7329a3b > +* UTF-8 and UTF-16 decoding is now 2x to 4x faster. In fact now UTF-16 decoding faster for a maximum of +25% compared to Python 3.2 on my computers (and sometimes a little slower yet). 2x to 4x it is faster compared to former slow-downed Python 3.3 (thanks to PEP 393). |
|
|