msg173301 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2012-10-19 00:51 |
Oh oh, it looks like the implementation of tailmatch() was not finished: /* If both are of the same kind, memcmp is sufficient */ if (kind_self == kind_sub) { return ...; } /* otherwise we have to compare each character by first accesing it */ else { /* We do not need to compare 0 and len(substring)-1 because the if statement above ensured already that they are equal when we end up here. */ /* TODO: honor direction and do a forward or backwards search */ for (i = 1; i < end_sub; ++i) { if (PyUnicode_READ(kind_self, data_self, offset + i) != PyUnicode_READ(kind_sub, data_sub, i)) return 0; } return 1; } |
|
|
msg174441 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-11-01 19:08 |
The result does not depend on the direction of comparison. This only affects speed. But who can to say in which direction comparison will be faster? Here I see a one obvious opportunity for optimization: if (kind_self < kind_sub) return 0; After that and after processing the case (kind_self == kind_sub) only 3 special cases left: UCS1 in UCS2, UCS1 in UCS4, and UCS2 in UCS4. Get rid of slow PyUnicode_READ() for this cases will speed up the code. Also note that comparing first and last characters before memcmp can be a slowdown (because PyUnicode_READ() is slow). Try to compare first and last bytes. |
|
|
msg174847 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2012-11-04 23:51 |
Oh, PyUnicode_Tailmatch() documentation doesn't mention that the function can fail. |
|
|
msg174884 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-11-05 09:22 |
> Oh, PyUnicode_Tailmatch() documentation doesn't mention that the function > can fail. But it does. .. c:function:: int PyUnicode_Tailmatch(PyObject *str, PyObject *substr, \ Py_ssize_t start, Py_ssize_t end, int direction) Return 1 if *substr* matches ``str[start:end]`` at the given tail end (*direction* == -1 means to do a prefix match, *direction* == 1 a suffix match), 0 otherwise. Return ``-1`` if an error occurred. |
|
|
msg174893 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2012-11-05 11:36 |
>> Oh, PyUnicode_Tailmatch() documentation doesn't mention that the function >> can fail. > > But it does. > > .. c:function:: int PyUnicode_Tailmatch(PyObject *str, PyObject *substr, \ > Py_ssize_t start, Py_ssize_t end, int direction) > > Return 1 if *substr* matches ``str[start:end]`` at the given tail end > (*direction* == -1 means to do a prefix match, *direction* == 1 a suffix match), > 0 otherwise. Return ``-1`` if an error occurred. Oh, I read the "documentation" in unicodeobject.h: /* Return 1 if substr matches str[start:end] at the given tail end, 0 otherwise. */ The problem is that tailmatch() returns 0 if PyUnicode_READY() failed. It is a bug, even if it cannot occur because PyUnicode_Tailmatch() checks that the both strings are ready. |
|
|
msg178901 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-01-03 02:20 |
New changeset 49eb2488145d by Victor Stinner in branch 'default': Close #16281: handle tailmatch() failure and remove useless comment http://hg.python.org/cpython/rev/49eb2488145d |
|
|
msg178902 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2013-01-03 02:21 |
"Here I see a one obvious opportunity for optimization: ..." @Serhiy: Can you please open a new issue for this? I consider the issue as fixed: I just removed the TODO (for the reason explained in the changeset). |
|
|
msg178941 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-01-03 12:43 |
Shouldn't this be applied to 3.3? As for optimization, I made some benchmarks and didn't saw any significant difference. Usually this function used to check short ASCII heads and tails and any optimization will not be seen even under a microscope. |
|
|
msg178944 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2013-01-03 13:38 |
> Shouldn't this be applied to 3.3? It's just a cleanup, it doesn't fix any real bug. I prefer to not pollute old versions with cleanup. > As for optimization, I made some benchmarks and didn't saw any significant difference. Usually this function used to check short ASCII heads and tails and any optimization will not be seen even under a microscope. Ok, agreed. |
|
|