Speed up tokenizing of a row in csv and xstrtod parsing by vnlitvinov · Pull Request #25784 · pandas-dev/pandas (original) (raw)
The results are even more promising if I allow more warmup and more sampling time so Turboboost and frequency scaling don't impact the performance too much.
Running asv continuous -f 1.05 origin/master HEAD -b io.csv -a sample_time=2 -a warmup_time=2
yields:
before | after | ratio | test name |
---|---|---|---|
[e8d951d] | [a4f6dcd] | ||
master | speed-up-tokenizer | ||
34.9±0.2ms | 32.9±1ms | 0.94 | io.csv.ReadCSVCategorical.time_convert_direct |
13.5±0.02ms | 12.6±0.08ms | 0.93 | io.csv.ReadCSVThousands.time_thousands(',', ',') |
5.10±0.07ms | 4.71±0.09ms | 0.92 | io.csv.ReadUint64Integers.time_read_uint64_neg_values |
14.6±0.06ms | 13.0±0.04ms | 0.89 | io.csv.ReadCSVThousands.time_thousands('|', ',') |
16.0±0.3ms | 13.8±0.09ms | 0.86 | io.csv.ReadCSVSkipRows.time_skipprows(None) |
10.3±0.1ms | 8.81±0.1ms | 0.86 | io.csv.ReadCSVSkipRows.time_skipprows(10000) |
12.9±0.1ms | 10.7±0.09ms | 0.84 | io.csv.ReadCSVThousands.time_thousands('|', None) |
13.0±0.05ms | 10.8±0.08ms | 0.83 | io.csv.ReadCSVThousands.time_thousands(',', None) |