Speed up tokenizing of a row in csv and xstrtod parsing by vnlitvinov · Pull Request #25784 · pandas-dev/pandas (original) (raw)

The results are even more promising if I allow more warmup and more sampling time so Turboboost and frequency scaling don't impact the performance too much.

Running asv continuous -f 1.05 origin/master HEAD -b io.csv -a sample_time=2 -a warmup_time=2 yields:

before	after	ratio	test name
[e8d951d]	[a4f6dcd]
master	speed-up-tokenizer
34.9±0.2ms	32.9±1ms	0.94	io.csv.ReadCSVCategorical.time_convert_direct
13.5±0.02ms	12.6±0.08ms	0.93	io.csv.ReadCSVThousands.time_thousands(',', ',')
5.10±0.07ms	4.71±0.09ms	0.92	io.csv.ReadUint64Integers.time_read_uint64_neg_values
14.6±0.06ms	13.0±0.04ms	0.89	io.csv.ReadCSVThousands.time_thousands('\|', ',')
16.0±0.3ms	13.8±0.09ms	0.86	io.csv.ReadCSVSkipRows.time_skipprows(None)
10.3±0.1ms	8.81±0.1ms	0.86	io.csv.ReadCSVSkipRows.time_skipprows(10000)
12.9±0.1ms	10.7±0.09ms	0.84	io.csv.ReadCSVThousands.time_thousands('\|', None)
13.0±0.05ms	10.8±0.08ms	0.83	io.csv.ReadCSVThousands.time_thousands(',', None)