Issue 16281: TODO in tailmatch(): it does not support backward in all cases (original) (raw)

Created on 2012-10-19 00:51 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (9)
msg173301 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-10-19 00:51
Oh oh, it looks like the implementation of tailmatch() was not finished: /* If both are of the same kind, memcmp is sufficient / if (kind_self == kind_sub) { return ...; } / otherwise we have to compare each character by first accesing it / else { / We do not need to compare 0 and len(substring)-1 because the if statement above ensured already that they are equal when we end up here. / / TODO: honor direction and do a forward or backwards search */ for (i = 1; i < end_sub; ++i) { if (PyUnicode_READ(kind_self, data_self, offset + i) != PyUnicode_READ(kind_sub, data_sub, i)) return 0; } return 1; }
msg174441 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2012-11-01 19:08
The result does not depend on the direction of comparison. This only affects speed. But who can to say in which direction comparison will be faster? Here I see a one obvious opportunity for optimization: if (kind_self < kind_sub) return 0; After that and after processing the case (kind_self == kind_sub) only 3 special cases left: UCS1 in UCS2, UCS1 in UCS4, and UCS2 in UCS4. Get rid of slow PyUnicode_READ() for this cases will speed up the code. Also note that comparing first and last characters before memcmp can be a slowdown (because PyUnicode_READ() is slow). Try to compare first and last bytes.
msg174847 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-11-04 23:51
Oh, PyUnicode_Tailmatch() documentation doesn't mention that the function can fail.
msg174884 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2012-11-05 09:22
> Oh, PyUnicode_Tailmatch() documentation doesn't mention that the function > can fail. But it does. .. c:function:: int PyUnicode_Tailmatch(PyObject str, PyObject substr, \ Py_ssize_t start, Py_ssize_t end, int direction) Return 1 if substr matches ``str[start:end]`` at the given tail end (direction == -1 means to do a prefix match, direction == 1 a suffix match), 0 otherwise. Return ``-1`` if an error occurred.
msg174893 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-11-05 11:36
>> Oh, PyUnicode_Tailmatch() documentation doesn't mention that the function >> can fail. > > But it does. > > .. c:function:: int PyUnicode_Tailmatch(PyObject str, PyObject substr, \ > Py_ssize_t start, Py_ssize_t end, int direction) > > Return 1 if substr matches ``str[start:end]`` at the given tail end > (direction == -1 means to do a prefix match, direction == 1 a suffix match), > 0 otherwise. Return ``-1`` if an error occurred. Oh, I read the "documentation" in unicodeobject.h: /* Return 1 if substr matches str[start:end] at the given tail end, 0 otherwise. */ The problem is that tailmatch() returns 0 if PyUnicode_READY() failed. It is a bug, even if it cannot occur because PyUnicode_Tailmatch() checks that the both strings are ready.
msg178901 - (view)	Author: Roundup Robot (python-dev)	Date: 2013-01-03 02:20
New changeset 49eb2488145d by Victor Stinner in branch 'default': Close #16281: handle tailmatch() failure and remove useless comment http://hg.python.org/cpython/rev/49eb2488145d
msg178902 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-01-03 02:21
"Here I see a one obvious opportunity for optimization: ..." @Serhiy: Can you please open a new issue for this? I consider the issue as fixed: I just removed the TODO (for the reason explained in the changeset).
msg178941 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-01-03 12:43
Shouldn't this be applied to 3.3? As for optimization, I made some benchmarks and didn't saw any significant difference. Usually this function used to check short ASCII heads and tails and any optimization will not be seen even under a microscope.
msg178944 - (view)	Author: STINNER Victor (vstinner) *	Date: 2013-01-03 13:38
> Shouldn't this be applied to 3.3? It's just a cleanup, it doesn't fix any real bug. I prefer to not pollute old versions with cleanup. > As for optimization, I made some benchmarks and didn't saw any significant difference. Usually this function used to check short ASCII heads and tails and any optimization will not be seen even under a microscope. Ok, agreed.

History
Date	User	Action	Args
2022-04-11 14:57:37	admin	set	github: 60485
2013-01-03 13:38:55	vstinner	set	messages: +
2013-01-03 12:43:01	serhiy.storchaka	set	messages: +
2013-01-03 02:21:46	vstinner	set	messages: +
2013-01-03 02:20:27	python-dev	set	status: open -> closednosy: + python-devmessages: + resolution: fixedstage: needs patch -> resolved
2012-11-15 20:29:14	serhiy.storchaka	set	components: + Interpreter Corestage: needs patch
2012-11-05 11:36:08	vstinner	set	messages: +
2012-11-05 09:22:00	serhiy.storchaka	set	messages: +
2012-11-04 23:51:09	vstinner	set	messages: +
2012-11-01 19:08:04	serhiy.storchaka	set	messages: +
2012-10-19 00:51:23	vstinner	create