Series.map should return default dictionary values rather than NaN (original) (raw)
collections.Counter and collections.defaultdict both have default values. However, pandas.Series.map does not respect these defaults and instead returns missing values.
The issue is illustrated below:
import pandas from collections import Counter, defaultdict input = pandas.Series(range(5)) counter = Counter() counter[1] += 1 output = input.map(counter) expected = series.map(lambda x: counter[x]) pandas.DataFrame({ 'input': input, 'output': output, 'expected': expected, })
Here's the output:
expected input output
0 0 0 NaN
1 1 1 1.0
2 0 2 NaN
3 0 3 NaN
4 0 4 NaN
The workaround is rather easy (lambda x: dictionary[x]) and shouldn't be to hard to implement. Are people on board with the change? Is there a performance concern with looking up each key independently?