Issue 28921: Make str.count one character for latin1 string faster (original) (raw)

Created on 2016-12-09 14:38 by xiang.zhang, last changed 2022-04-11 14:58 by admin.

Files
File name	Uploaded	Description	Edit
latin1-str-count-one-character.patch	xiang.zhang,2016-12-09 14:38	review

Messages (5)
msg282784 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-12-09 14:38
I make a try to improve str.count one character for latin1 string. From benchmark, it could enhance the speed depending on the length of the string. # short one, a little regression ./python3 -m perf timeit --compare-to ~/cpython/python -s 's="abcdefg"' 's.count("a")' python: ..................... 190 ns +- 1 ns python3: ..................... 197 ns +- 2 ns Median +- std dev: [python] 190 ns +- 1 ns -> [python3] 197 ns +- 2 ns: 1.04x slower # a little longer ./python3 -m perf timeit --compare-to ~/cpython/python -s 's="abcdefghi"' 's.count("a")' python: ..................... 192 ns +- 15 ns python3: ..................... 184 ns +- 2 ns Median +- std dev: [python] 192 ns +- 15 ns -> [python3] 184 ns +- 2 ns: 1.05x faster # longer ./python3 -m perf timeit --compare-to ~/cpython/python -s 's="abcdefghihijklmnopqrstuvwxyz~!@##$%^&()-=_+{}\|"' 's.count("a")' python: ..................... 236 ns +- 12 ns python3: ..................... 209 ns +- 9 ns Median +- std dev: [python] 236 ns +- 12 ns -> [python3] 209 ns +- 9 ns: 1.13x faster # very long ./python3 -m perf timeit --compare-to ~/cpython/python -s 's="abcdefgHijk"100' 's.count("z")' python: ..................... 1.02 us +- 0.00 us python3: ..................... 610 ns +- 1 ns Median +- std dev: [python] 1.02 us +- 0.00 us -> [python3] 610 ns +- 1 ns: 1.67x faster And since str.replace may also go through the code path involving count, it's somewhat affected: ./python3 -m perf timeit --compare-to ~/cpython/python -s 's="abcdefghihijklmnopqrstuvwxyz~!@##$%^&*()-=_+{}	"' 's.replace("a", "ccc")' python: ..................... 318 ns +- 31 ns python3: ..................... 298 ns +- 19 ns Median +- std dev: [python] 318 ns +- 31 ns -> [python3] 298 ns +- 19 ns: 1.07x faster For ucs2 and ucs4, the benchmarks are not appealing so leave them untouched.
msg282785 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-12-09 14:53
Hum, Serhiy is better than me to review such hardcore C code :-) I let him review the patch ;-) > short one, a little regression In stringlib, the usual solution is to use a threshold: use a dummy loop for less than N bytes, otherwise use the ultra-optimized loop. Serhiy event implemented a "dynamic" threshold in some functions, when too many false positive are found. I don't recall where. "And since str.replace may also go through the code path involving count, it's somewhat affected: (...) 1.07x faster" I'm not really excited by optimizing str.count, since I don't think that this function is commonly used. But if str.replace is made faster, I'm interested :-) I understand that count() is only used when the old and new patterns of str.replace() have a different length.
msg282800 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-12-09 19:33
> I understand that count() is only used when the old and new patterns of str.replace() have a different length. Yes. I thought it won't help much since str.replace get many operations. But for long string, looks good: ./python3 -m perf timeit --compare-to ~/cpython/python -s 's="abcdefghihijklmnopqrstuvwxyz~!@##$%^&()-=_+{}\|"100' 's.replace("a", "bc")' python: ..................... 7.36 us +- 0.04 us python3: ..................... 4.91 us +- 0.04 us Median +- std dev: [python] 7.36 us +- 0.04 us -> [python3] 4.91 us +- 0.04 us: 1.50x faster # 50% ??!! how? And this patch also applies to bytes since they share codes.
msg282803 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2016-12-09 20:46
The code looks too complex. I think that it is not worth to complicate the code so much for optimizing just few non-critical string operations. Many optimizations were rejected in the past due to high cost and low benefit.
msg283027 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2016-12-12 16:49
> The code looks too complex. It is if looking at the patch alone. But the skill is used across stringlib and the only thing new is the bitwise operation. str/bytes.count is not critical but not bad if it could be optimized and especially the effect is significant for long data. I am not favour of rejecting it now but set priority to low.

History
Date	User	Action	Args
2022-04-11 14:58:40	admin	set	github: 73107
2016-12-12 16:49:06	xiang.zhang	set	priority: normal -> lowmessages: +
2016-12-09 20:46:43	serhiy.storchaka	set	messages: +
2016-12-09 19:33:28	xiang.zhang	set	messages: +
2016-12-09 14:53:28	vstinner	set	messages: +
2016-12-09 14:38:24	xiang.zhang	create