Issue 28943: Use PyUnicode_MAX_CHAR_VALUE instead of PyUnicode_KIND in some API's short path (original) (raw)

Issue28943

Created on 2016-12-12 11:22 by xiang.zhang, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
short-path.patch xiang.zhang,2016-12-12 11:22 review
Messages (3)
msg282982 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-12-12 11:22
Some unicode APIs like PyUnicode_Contains get a short path comparing kinds. But this get a problem cannot apply to ascii and latin1. PyUnicode_MAX_CHAR_VALUE could be used instead to make the short path also apply to ascii and latin1. This skill is already used in PyUnicode_Replace.
msg282983 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-12-12 11:37
PyUnicode_KIND() just extracts three bits from the state word. PyUnicode_MAX_CHAR_VALUE() extracts bits multiple times and does few conditional branching. I think it is much slower that PyUnicode_KIND(). In common case you search ASCII needle or the needle of the same kind as a string, therefore checking for fast path just adds the overhead. It is appropriate while the overhead is tiny. Optimize common cases, not rare and obscure cases.
msg282990 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-12-12 12:44
I know the difference and thought the overhead should be tiny (not in a critical part). But benchmarks show it's not. :-(
History
Date User Action Args
2022-04-11 14:58:40 admin set github: 73129
2016-12-12 12:45:04 xiang.zhang set resolution: rejected
2016-12-12 12:44:42 xiang.zhang set status: open -> closedmessages: + stage: patch review -> resolved
2016-12-12 11:37:38 serhiy.storchaka set messages: +
2016-12-12 11:22:10 xiang.zhang create