Speed up tokenizing of a row in csv and xstrtod parsing by vnlitvinov · Pull Request #25784 · pandas-dev/pandas (original) (raw)

The results are even more promising if I allow more warmup and more sampling time so Turboboost and frequency scaling don't impact the performance too much.

Running asv continuous -f 1.05 origin/master HEAD -b io.csv -a sample_time=2 -a warmup_time=2 yields:

before after ratio test name
[e8d951d] [a4f6dcd]
master speed-up-tokenizer
34.9±0.2ms 32.9±1ms 0.94 io.csv.ReadCSVCategorical.time_convert_direct
13.5±0.02ms 12.6±0.08ms 0.93 io.csv.ReadCSVThousands.time_thousands(',', ',')
5.10±0.07ms 4.71±0.09ms 0.92 io.csv.ReadUint64Integers.time_read_uint64_neg_values
14.6±0.06ms 13.0±0.04ms 0.89 io.csv.ReadCSVThousands.time_thousands('|', ',')
16.0±0.3ms 13.8±0.09ms 0.86 io.csv.ReadCSVSkipRows.time_skipprows(None)
10.3±0.1ms 8.81±0.1ms 0.86 io.csv.ReadCSVSkipRows.time_skipprows(10000)
12.9±0.1ms 10.7±0.09ms 0.84 io.csv.ReadCSVThousands.time_thousands('|', None)
13.0±0.05ms 10.8±0.08ms 0.83 io.csv.ReadCSVThousands.time_thousands(',', None)