ENH: Add normalize to crosstab · Issue #12569 · pandas-dev/pandas (original) (raw)
It'd be great to have a simple normalization
option for cross tab to get shares rather than frequencies.
Something that would do something like:
def normalize(x):
return len(x)/len(w_mobile.language)
pd.crosstab(w_mobile.language,w_mobile.carrier, values=w_mobile.language, aggfunc=normalize)
as just an option.
(The ability to do row-normalizations and column normalizations would also be great -- so all entries in a row add to 1 or all entries in a column add to 1). Similar in behavior (for row normalizations) as:
l = list()
df = pd.DataFrame({'carrier':['a','a','b','b','b'], 'language':['english','spanish', 'english','spanish','spanish']})
for i in df.carrier.unique():
temp = df.query('carrier=="{}"'.format(i)).language.value_counts(normalize=True)
temp.name = i
l.append(temp)
ctab = pd.concat(l, axis=1)
Out[1]:
a b
english 0.5 0.333333
spanish 0.5 0.666667
But with a command like: pd.crosstab(df.language, df.carrier, normalization='row')