Issue 28943: Use PyUnicode_MAX_CHAR_VALUE instead of PyUnicode_KIND in some API's short path (original) (raw)

Issue28943

Created on 2016-12-12 11:22 by xiang.zhang, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
short-path.patch	xiang.zhang,2016-12-12 11:22	review

Messages (3)
msg282982 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-12-12 11:22
Some unicode APIs like PyUnicode_Contains get a short path comparing kinds. But this get a problem cannot apply to ascii and latin1. PyUnicode_MAX_CHAR_VALUE could be used instead to make the short path also apply to ascii and latin1. This skill is already used in PyUnicode_Replace.
msg282983 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-12-12 11:37
PyUnicode_KIND() just extracts three bits from the state word. PyUnicode_MAX_CHAR_VALUE() extracts bits multiple times and does few conditional branching. I think it is much slower that PyUnicode_KIND(). In common case you search ASCII needle or the needle of the same kind as a string, therefore checking for fast path just adds the overhead. It is appropriate while the overhead is tiny. Optimize common cases, not rare and obscure cases.
msg282990 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-12-12 12:44
I know the difference and thought the overhead should be tiny (not in a critical part). But benchmarks show it's not. :-(

History
Date	User	Action	Args
2022-04-11 14:58:40	admin	set	github: 73129
2016-12-12 12:45:04	xiang.zhang	set	resolution: rejected
2016-12-12 12:44:42	xiang.zhang	set	status: open -> closedmessages: + stage: patch review -> resolved
2016-12-12 11:37:38	serhiy.storchaka	set	messages: +
2016-12-12 11:22:10	xiang.zhang	create