msg145372 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-10-11 23:14 |
This patch optimizes scanning for the max character width in an unicode buffer. Micro-benchmarking some worst case situations: $ ./python -m timeit -s "x='é'+'x'*100000" "x[1:]" -> before: 10000 loops, best of 3: 74.9 usec per loop -> after: 100000 loops, best of 3: 15.5 usec per loop $ ./python -m timeit -s "x='\U00010000'+'x'*100000" "x[1:]" -> before: 10000 loops, best of 3: 138 usec per loop -> after: 10000 loops, best of 3: 71.3 usec per loop |
|
|
msg145373 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-10-11 23:22 |
I hadn't noticed the STRINGLIB_SIZEOF_CHAR constant. Reuse it instead of adding STRINGLIB_CHAR_SIZE. |
|
|
msg145374 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-10-11 23:32 |
Slightly cleaned up patch after Victor's comments in private. |
|
|
msg145375 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-10-11 23:42 |
find_max_char() returns 0x10000 instead of 0x10FFFF, which may be wrong (or at least, surprising). You may add a max_char variable using other macros like MAX_CHAR_ASCII, MAX_CHAR_UCS1, ..., which will be set at the same time than mask. Or restore your if (ret >= 0x10000) return 0x10ffff; else return ret;. Constants look inconsistent: + const STRINGLIB_CHAR *unrolled_end = begin + (n & ~ (Py_ssize_t) 7); + p += 4; You may use STRINGLIB_CHAR in: + Py_UCS4 bits = p[0] | p[1] |
p[2] |
p[3]; |
msg145376 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-10-11 23:50 |
Ok, updated patch. |
|
|
msg145377 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-10-12 00:11 |
> Ok, updated patch. "ret = ~mask + 1;" looks wrong: (~0xFFFFFF80+1) gives 128, not 127. I don't see why you need: + if (ret < 128) + return 127; + if (ret < 256) + return 255; #undef ASCII_CHAR_MASK should be #undef UCS1_ASCII_CHAR_MASK #error Invalid STRINGLIB_SIZEOF_CHAR (must be 1, 2 or 4) should be #error Invalid STRINGLIB_SIZEOF_CHAR (must be 2 or 4) Why do you need these forward declarations? It's maybe related to another patch? static PyObject * +unicode_fromascii(const unsigned char *s, Py_ssize_t size); +static PyObject * +_PyUnicode_FromUCS1(const unsigned char *s, Py_ssize_t size); +static PyObject * +_PyUnicode_FromUCS2(const Py_UCS2 *s, Py_ssize_t size); +static PyObject * +_PyUnicode_FromUCS4(const Py_UCS4 *s, Py_ssize_t size); (You kept Py_UCS4 for "Py_UCS4 bits", but Py_UCS4 is maybe just fine.) By the way, your function rocks :-) Use bit masks is a great idea, especially your "bits = p[0] | p[1] |
p[2] |
p[3]" "hack". |
msg145378 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-10-12 00:24 |
> > Ok, updated patch. > > "ret = ~mask + 1;" looks wrong: (~0xFFFFFF80+1) gives 128, not 127. That's on purpose, since the mask has just matched. If 0xFFFFFF80 matches, then the max char can't be 127, it has to be at least 128. > I don't see why you need: > > + if (ret < 128) > + return 127; > + if (ret < 256) > + return 255; Because otherwise you're returning 0xFFFF in the following line: if (ret < 0x10000) return 0xFFFF; > #undef ASCII_CHAR_MASK should be #undef UCS1_ASCII_CHAR_MASK Ha, funny that gcc doesn't say anything. I also forgot to #under MASK_*. > #error Invalid STRINGLIB_SIZEOF_CHAR (must be 1, 2 or 4) > should be > #error Invalid STRINGLIB_SIZEOF_CHAR (must be 2 or 4) Well, no. 1 is a legal value, even though it's handled elsewhere, so the error message is right. > Why do you need these forward declarations? It's maybe related to another patch? It's because I'm moving the stringlib #includes up in unicodeobject.c, and stringlib calls these functions. |
|
|
msg145427 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-10-12 20:21 |
Without the patch: python3.2 -m timeit 'x="é"+"x"*10000' 'x[1:]' 100000 loops, best of 3: 2.18 usec per loop python3.3 -m timeit 'x="é"+"x"*10000' 'x[1:]' 100000 loops, best of 3: 6.68 usec per loop |
|
|
msg145428 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-10-12 20:22 |
With find_max_char4.patch: python3.3 -m timeit 'x="é"+"x"*10000' 'x[1:]' 100000 loops, best of 3: 1.96 usec per loop |
|
|
msg145432 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2011-10-12 21:10 |
find_max_char5.patch: - don't use adjusted ~mask+1: "precompute" the right max_char - rename findwidth.h to find_max_char.h - add some #undef |
|
|
msg145437 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2011-10-12 22:08 |
New changeset 9c7d3207fc15 by Antoine Pitrou in branch 'default': Issue #13155: Optimize finding the optimal character width of an unicode string http://hg.python.org/cpython/rev/9c7d3207fc15 |
|
|
msg145438 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-10-12 22:08 |
> find_max_char5.patch: > - don't use adjusted ~mask+1: "precompute" the right max_char > - rename findwidth.h to find_max_char.h > - add some #undef Thank you, I've committed this version. |
|
|