pandas.DataFrame.value_counts — pandas 2.2.3 documentation (original) (raw)
DataFrame.value_counts(subset=None, normalize=False, sort=True, ascending=False, dropna=True)[source]#
Return a Series containing the frequency of each distinct row in the Dataframe.
Parameters:
subsetlabel or list of labels, optional
Columns to use when counting unique combinations.
normalizebool, default False
Return proportions rather than frequencies.
sortbool, default True
Sort by frequencies when True. Sort by DataFrame column values when False.
ascendingbool, default False
Sort in ascending order.
dropnabool, default True
Don’t include counts of rows that contain NA values.
Added in version 1.3.0.
Returns:
Series
Notes
The returned Series will have a MultiIndex with one level per input column but an Index (non-multi) for a single label. By default, rows that contain any NA values are omitted from the result. By default, the resulting Series will be in descending order so that the first element is the most frequently-occurring row.
Examples
df = pd.DataFrame({'num_legs': [2, 4, 4, 6], ... 'num_wings': [2, 0, 0, 0]}, ... index=['falcon', 'dog', 'cat', 'ant']) df num_legs num_wings falcon 2 2 dog 4 0 cat 4 0 ant 6 0
df.value_counts() num_legs num_wings 4 0 2 2 2 1 6 0 1 Name: count, dtype: int64
df.value_counts(sort=False) num_legs num_wings 2 2 1 4 0 2 6 0 1 Name: count, dtype: int64
df.value_counts(ascending=True) num_legs num_wings 2 2 1 6 0 1 4 0 2 Name: count, dtype: int64
df.value_counts(normalize=True) num_legs num_wings 4 0 0.50 2 2 0.25 6 0 0.25 Name: proportion, dtype: float64
With dropna set to False we can also count rows with NA values.
df = pd.DataFrame({'first_name': ['John', 'Anne', 'John', 'Beth'], ... 'middle_name': ['Smith', pd.NA, pd.NA, 'Louise']}) df first_name middle_name 0 John Smith 1 Anne 2 John 3 Beth Louise
df.value_counts() first_name middle_name Beth Louise 1 John Smith 1 Name: count, dtype: int64
df.value_counts(dropna=False) first_name middle_name Anne NaN 1 Beth Louise 1 John Smith 1 NaN 1 Name: count, dtype: int64
df.value_counts("first_name") first_name John 2 Anne 1 Beth 1 Name: count, dtype: int64